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SEQUENTIAL EXPERIMENTATION* 
R. A. FIsHER 


Department of Genetics 
Cambridge 


_— PRESENT USE Of the term sequential is intended to be of a broader 
import than the formal use of the word as associated with the syste- 
matic procedure known as sequential analysis. The experimenter does 
not regard his material as wholly passive but instead looks to what may 
be learnt from it with a view to the improvement and extension of the 
enquiry. This willingness to learn from it how to proceed is the essential 
quality of sequential procedures. Wald introduced the sequential test, 
but the sequential idea is much older. For example, what is the policy 
of aresearch unit? It is that in time we may learn to do better and follow 
up our more promising results. The essence of sequential experimenta- 
tion is a series of experiments each of which depends on what has gone 
before. For example, in a sample survey scheme, as explained hy Yates, 
a pilot survey is intended to supply a basis for efficiently planning the 
subsequent stages of a survey. Again, successively smaller units may be 
chosen at various stages and the chance of what is chosen at a particular 
stage depends on what has gone before. 

These ideas are illustrated by two concrete examples which naturally 
arise in genetic work. One of these is quite near to Wald’s sequential 
testing. We are indebted to Wald for laying down a detailed procedure 
in advance specifying how we shall react in all cases, so giving to the 
sequences the same unity as a single act of sampling possesses, gnd enab- 
ling the ordinary notions of estimation, efficiency and sufficiency to be 
applied to sequential experimentation. The chief interest in the first 
problem will be in its comparison with and slight deviations from Wald’s 
procedure. 

Suppose that there is evidence of a lethal factor in heterozygotes 
linked with a visible factor such as black or brown color in the house 
mouse. Let us represent 


B black 
b brown 
lethal 
and + absence of lethal factor. 


*Notes on a lecture delivered on June 18, 1952 at Institute of Statistics Confer- 
ences, Blue Ridge, N. C. 
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Then in the mating 
B+ /bl = B+ /bdl 


the brown mice occur with a frequency much less than one quarter. 
There is another gene “misty”, about ten units from the black/brown 
locus, and we may wish to ascertain the location of the lethal gene rela- 
tive to brown and misty. Omitting the genetical details, a useful mating 
would be 

BIM/b + m = blm/b + m, 


which gives two brown mice to one black if linkage is close, and it will 
be necessary to test the Bl/b + mice before use. A sequential test may 
be used to find such animals soon enough, and only males would be 
tested as females cannot be tested sufficiently in time. The other test 
mating is 

B+ /b+ = bl/b+ 


which gives equal numbers of black and brown, and we wish to eliminate 
such males. This differs from the usual sequential test in that our object 
is the practical one of securing material of a particular kind, rather than 
the theoretical one of determining the proportion of each kind which is 
present. I will suppose that we have an ample supply of mice to be 
tested, so that we need have no hesitation about discarding any indi- 
vidual, and there is no interest in minimising the probability of rejecting 
a mouse of the desired kind. The only limitations will be on the number 
of cages which have been provided for the females. 

If at some stage we have counted x brown and y black mice, the 
probability of reaching that stage depends on whether we have used a 
2: 1or1:1 mouse. The probability in the case of a 2: 1 mouse is 


P, = C(%)*(3)", 


where C is the number of paths to reach that stage, and is like a binomial 
coefficient, but not equal to it for in the sequential test certain paths are 
blocked by elimination or acceptance. 

The probability for a 1 : 1 mouse is 


P, = C(3)*(2)”. 


Taking logarithms 
L, = log C + = log(3) + y log(4) 
L, = log C + x log(3) + y log(4) 
L, — L, = x log($) — y log(3). 
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Lines of equal likelihood ratio would be parallel lines if x and y were 
continuous variates. Actually we can take cells (x,y) as shown in the 
figure and the acceptance or rejection contours will be broken lines like 
those shown in the figure. 

Suppose that the initial frequency of 1 : 1 and 2: 1 mice was 9: 1. 
Then the frequency at the point 2,y will be 9e”* : e”* and the ratio of 
good to bad will be 


= 


When L, > L, the quality is worse than in the original material, and 
we can reject it right away. In the diagram, the solid line divides the 
positive likelihood region from the negative likelihood region. 

Suppose we want the ratio of the desired to the undesired type to be 
5: 1, in the selected animals. Then 


L, = log 45, 


shown by the double line. Mice below this line corresponding to 
L, — L, > log 45 would be accepted as being at least 45 times as likely 
to be desirable as the original ones. We could choose any other ratio, 
depending on our anxiety regarding the use of the wrong genotype. 

If the young were examined one at a time, the number of paths to a 
particular cell in the diagram can be written down in a manner similar 
to that used by Fermat and Pascal to write the “arithmetical triangle’. 
This would provide the value of C for any cell. We could find the fre- 
quencies with which the two types are distributed over the area, and 
they would be crossing over the lower boundary line at all points in the 
ratio of about 5: 1. In addition, we could divide the intermediate area 
into zones with lines of equal likelihood. This provides the possibility 
of testing more intensively those more probably desirable and hence of 
hastening the finding of acceptable mice in a mass of unacceptable 
material. 

I shall now consider a different type of sequential experimentation. 
Geneticists are often expected to advise on the improvement of dairy 
cattle. The naive geneticist may think that he can accomplish this 
merely by selecting high producing cows, though it is really much more 
difficult than that. Now it is possible to visualise a situation in which 
one big cow is equal in all respects to two little cows, including food 
consumption. Then it would be possible for our geneticist with 50 
million dollars of public funds to expend over a 20 year period to develop 
a cow which produces twice as much milk as the animals from which he 
started. The geneticist would then say that his breeding program has 
been successful in doubling the yield of milk, and “if only farmers were 
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keeping as many cows as they used to, the milk supply would be doubled. 
Unfortunately, the farmers are now keeping only half as many cows.” 
This is an extreme case with a fairly obvious moral. 
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If we select a type only for milk yield and at random for other charac- 
ters then we are likely to injure it in other respects. When you select for 
one character you affect everything else. You need to select on all 
valuable characters to avoid injuring the stock, for example, selecting 
for milk yield alone may decrease longevity. 

What do we mean by measuring the milk yield of a cow? For a long 
time now milk yield records have been available. Such records may have 
been advantageous, but in many cases certainly they have been abused. 
There have even been cases where the milk produced has been fed back 
to the cows with a view to obtaining record yields. 

Many physiologists believe that it is possible to make a clean sepa- 
ration of Science and Economics, in that they can describe the reactions 
of an animal in physical terms, and leave it to the farmer to judge what 
uses of it are profitable. I do not believe such a clean separation can be 
made. Where economics comes in is that for any machine there will be 
conditions of favorable working and unfavorable working which are 
within the choice of the operator of the machine. To choose among 
machines we must test each when operated in the most profitable way. 
In feeding cows there will be more or less profitable operations open to 
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the farmer and the scientific problem becomes definite when these alter- 
natives are recognized. This becomes technically very difficult. How 
can we ensure that an individual cow is tested in conditions which are 
most profitable for that individual? How can we induce the cow to tell 
us how she ought to be fed to give us the maximum profit? This can be 
attempted in an experimental situation by using the accepted feeding 
standards to establish a basic rate of feeding for the individual animal 
taking account of the shape of the lactation curve, and the maintenance 
requirement of the animal. This provides a feeding curve to which may 
be added several parallel curves both above and below it, separated by 
intervals corresponding to differences in feed which are the economic 
equivalent of about one gallon of milk a week. 

We can make the following index table for the individual cow, where 
a is the overhead cost and z is the cost per unit of milk. 


Food x Cost per unit 
Week (in cents) Milk y (x +a)/y =z 
1 
2 Ze Y2 22 
3 ¥3 23 
4 
5 zs Ys 25 
6 Ye 26 


If the cost per unit in the second week is less than the average in the 
first and third weeks, then in the following week we should go back to 
the same feeding level as in the second week. Comparisons and decisions 
may be made by looking at successive sets of three records on z. If cost 
per unit for the third week is more than the average for the second and 
fourth weeks we again raise the feeding level. No comparison is then 
made at the end of the fifth week and the sixth week feeding level is 
taken to be that of the fourth. We may go wrong in a procedure of this 
kind, but it should be tried and scrutinized. There should be some such 
method of using animal reactions to guide experiments in milk yield. 

I would call this a sequential test of milk yield. It has the advantage 
that we do get cost at the end and that we can then give preference to 
those animals which produce most cheaply. I suggest that the use of 
identical twins may be of valuable assistance in estimating the precision 
of the trial. 
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SEQUENTIAL MEDICAL PLANS 


I. Bross 


Biostatistics Department 
School of Public Health and Hygiene 
Johns Hopkins University 


¢ CLINICAL MEDICAL EXPERIMENTATION the disease under study may 
+ be relatively rare and it may be necessary to continue the study over 
months or even years in order to acquire a respectable series of patients. 


- In the last few years statistical techniques have been developed which 


apply to situations where observations are gathered over time. Although 
such techniques seem well suited to the needs of clinical research (at least 
in principle), it is necessary to adapt these methods to the thorny special 
problems of medical research. The procedures discussed in this paper 
are an attempt to make this adaptation—an attempt which is neces- 
sarily a preliminary one. 

The presentation is intended for medical research workers. 


1. Sequential Analysis 


In this article I am going to discuss a new technique, developed by 
statisticians, that seems to be especially well suited to medical experi- 
mentation. The method is called “Sequential Analysis”. Despite the 
formidable title, the method is often very easy to use. 

It offers three important advantages over the usual (“fixed sample 
size’) statistical procedures. First of all it allows for analysis of the data 
as it comes in (instead of waiting until the end of the experiment for the 
analysis). Second it may allow an appreciable reduction in the amount 
of data that has to be collected to reach statistically valid conclusions. 
Finally, it may eliminate ALL computation on the part of the research 
worker! 

Sequential Analysis was developed during the second world war by 
A. Wald (1) but was classed as a military secret. Since the end of the 
war it has been used in various industrial applications and in a few 
scientific investigations. 

In this article we shall focus attention on the application of Sequential 
Analyses to a specific, but rather common, type of medical experiment. 
I do not want to give the impression, however, that the method is limited 
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to the particular situation discussed here—it is a general technique. 

The problem to be considered is the case where we wish to compare 
two “treatments”. These “treatments’’ may be two drugs, two diets, 
or two operative techniques. To distinguish between the two treat- 
ments we shall call one the “new” treatment and the other the “old” or 
“standard” treatment. This terminology is merely a matter of conven- 
ience, the method applies to other comparisons—for example the “old” 
treatment might actually be a control group which received no treatment. 

We shall only consider the case where the outcome of the treatment 
is an “all-or-none” classification such as “lived-died” or “cured-not 
cured’. One limitation on the design of the experiment is that the 
individuals in the test series are paired. 

In this discussion we shall consider the case where, as patients come 
into the experiment, they are placed alternatively on the ‘‘new” and on 
the “old” treatments. Two consecutive patients therefore constitute a 
pair. The methods apply equally well to other pairings, for example 
pairing by age, sex, severity of disease, or other relevant factors. While 
we shall refer in this discussion to a “series of patients” it should be 
evident that the same methods apply to other experimental material 
such as animals, cultures, ete. 

Let us suppose that a doctor is running an experiment to test two 
operative techniques and that as patients are admitted with a specific 
diagnosis they are assigned alternately (or, preferably, by randomiza- 
tion) to the “old” or the “new” technique. Each pair of consecutive 
patients can be regarded as a separate “‘little’’ experiment, an individual 
test of the efficacy of the two operative techniques. For this little 
experiment there are four possible outcomes: 


Outcome | Outcome for Patient | Outcome for Patient 
Number | on the old treatment | on the new treatment 


Cured Cured 
2 Not Cured Not Cured 
3 Cured Not Cured 
4 Not Cured Cured 


Now what can be learned from this little experiment about the rela- 
tive efficacy of the two treatments? If both treatments lead to the same 
result then the experiment provides no information as to which treat- 
ment is superior." Hence for our purposes these “ties” can be dropped 


*“Tied” experiments may provide information on other questions (see See. 5). 
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altogether. In other words if outcomes 1 or 2 occur we can consider that 
we are no further along in our experiment. If outcome 3 occurs then 
this constitutes a small piece of evidence that the “old” treatment is 
superior. If outcome 4 occurs then this is some evidence that the “new” 
treatment is superior. — 

We can now regard the whole experiment as a sequence of these 
‘fittle’” experiments (which is the origin of the name “Sequential 
Analysis’). Suppose that a number of “‘little’”’ experiments have been 
performed. Then the research worker may pause and ask himself: 
“Should the experiment be stopped or continued?” This is the natural 
thing to do, but when the experimenter asks this question he is really 
making a Sequential Analysis. All that the statisticians have done is to 
formalize this step and provide a systematic procedure for answering this 
question. 

The experimenter at any stage in his sequence of “‘little’’ experiments 
has two choices. He can stop the experiment and make some recom- 
mendation (which can be called a terminal decision). Alternatively he 
can postpone the recommendations and continue the experiment. The 
appropriate decision will depend on what has happened up to that point 
in the sequence of “‘little’’ experiments. 

If “new” has won every time then it might seem plausible to stop the 
experiment and recommend the “new” treatment. Similarly if “old” 
has been consistently victorious it would be reasonable to stop the 
experiment and recommend the “old” treatment. If the experiment 
indicates a predominance of one or the other treatment, but the evidence 
is not quite convincing then the research worker might want to continue 
the experiment in the hope of achieving clear cut evidence. 

There remain further possibilities. The “new” and the “old” may 
have each won about the same number of times. In this event it might 
be sensible to stop the experiment because it would appear that there is 
not enough difference between the two treatments to warrant further 
work to determine which one is superior. 

A sequential scheme allows these decisions to be made in a more or 
less automatic fashion. To operate such a plan requires that, at each 
step in the experiment, the evidence should be summarized in order to 
determine whether or not to continue the experiment. If the decision 
is made to stop the experiment then simple rules must be set up to de- 
termine just which terminal decision to select, to decide whether to 
recommend one of the two treatments or to say that they are about 
equally good.” 


2One possibility, not explored in this paper, would be to allow conditional recommendations. 
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Although this might sound like a difficult procedure to carry through, 
as will be shown later the method can be reduced to a series of rules that 
require NO computation on the part of the research worker—not even 
addition ! 

The main distinction between a sequential scheme of this sort and 
the usual “fixed sample size” statistical methods is that in a sequential 
plan the length of the series required will depend on what sort of data is 
obtained throughout the course of the experiment. In general a sequen- 
tial plan will require a shorter series on the average than the usual sta- 
tistical methods. 

The reason for this is that the experiment will “cut off” automatically 
just as soon as sufficient evidence is obtained to make a recommendation. - 
Similarly if the experiment seems to be inconclusive and no advantage is 
evident for either treatment, then the experiment will be stopped. Con- 
sequently no more data needs to be collected than is necessary to do the 
job, and the resulting saving is reflected by the fact that the series 
required will, on the average, tend to be shorter. 


2. Sequential Charts 


The job of the research worker in a sequential scheme can generally 
be reduced to the following task : The doctor makes a chart of the progress 
of the experiment on an ordinary piece of graph paper. The “path” of 
the experiment is actually plotted by using the following very simple 
rules: 


Plotting Rules 


(1) The outcome of each “‘little’’ experiment is plotted as 
soon as the results are known. 

(2) If the “new” and the ‘‘old” treatments lead to the same 
outcome (both cured, both not cured) then no information about 
superiority is obtained and nothing is plotted on the chart. 

(3) If the “old” treatment effects a cure while the “new” 
treatment fails to cure, then mark an “‘x’’ in the square to the 
right of the last entry on the chart. 

(4) If the “new”. treatment effects a cure while the “old” 
treatment fails to cure, then mark an “‘x” in the square above the 
last entry on the chart. 


These instructions will be clearer if the chart in Figure 1 is considered. 
The experiment begins in the square marked “‘start”’ (which corresponds 
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to the situation where there is no information about the relative merits 
of the two treatments). 

The pattern of x’s marked in Figure | correspond to the following 
experimental results (ties have been discarded). 


Order of 
Outcome ‘‘old’”’ Outcome “new” 
Experiment 
1 Cured Not Cured 
2 Cured Not Cured 
3 Not Cured Cured 
4 4 Cured Not Cured 
7 5 Cured Not Cured 
q Top 
3 Barrier 
Middle Barrier 
Bottom Barrier 
x x x 
: 


FIGURE 1—HYPOTHETICAL SEQUENTIAL EXPERIMENT 
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The following remarks derive from these rules: 


(1) Every possible experiment is represented by some paths of x’s 
on the chart. 

(2) When the path goes “up” sharply this tends to indicate that the 
“new” treatment is superior. 

(3) When the path is horizontal this tends to indicate that the “old” 
treatment is superior. 

(4) When the path goes up at a 45 degree angle this indicates that 
the two treatments are equally good. 

The next important notion is the concept of a “barrier”. When the 
path of x’s crosses a barrier the experiment comes to an end. 

One can think of the path as the route taken by a robot who can only 
walk East or North and the barrier as representing the edge of a cliff. 
Once the robot steps over the barrier, that is the end of the experiment. 

The path in Figure 1 goes over the barrier at the last step, so the 
experiment is terminated. The appropriate recommendation to make 
depends on which barrier is crossed. In Figure 1 it is the bottom barrier 
so the recommendation would be to stick to the “old” treatment. 

The doctor can therefore use the following simple rules for deciding 
what to do at any stage in the experiment: 

(1) If no barrier has been crossed, continue the experiment. 

(2) If a barrier has been crossed, stop the experiment. Make the 
recommendation appropriate to the barrier that has been violated (this 
will ordinarily be marked on the chart). a 

These simple instructions allow the experimenter to make an analysis 
at each step in his experiment, and to decide the appropriate action to 
take next. As you can see Sequential Analysis requires very little effort 
on the part of the experimenter. 


3. The Choice of a Sequential Scheme 


In the next section I am going to present two sequential plans. These 
plans are intended as examples, they will only fit rather specific experi- 
mental situations. In general the sequential plan must be “‘tailored’’ to 
meet the needs of the research worker. I might add that although the 
use of a plan requires no work by the experimenter, the construction of 
a plan may take some sweat and strain on the part of the statistician. 

The choice of an appropriate sequential plan may be made if the 
research worker will set up a series of specijications for the plan. I now 
want to outline the procedure for setting the specifications. 

First of all it should be realized that statistical methods are a form of 
insurance. If the research worker makes recommendations he wants 
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some assurance that future experiments will confirm his reeommenda- 


tions. He would not want to proclaim the virtues of a new technique. 


or drug only to have this treatment turn out later to be worthless. 
Statistical techniques can not provide ABSOLUTE protection, because 
there will always be “sampling error” in any experiment. However, 
statistical methods can insure that the risk of error is not greater than 
some given amount. Just how much risk of error the research worker is 
willing to take is UP TO HIM. 

A common level of protection in medical research is the “5% level’”’. 
If a doctor specifies this level this means that he is willing to make 
erroneous statements (of a given type) only once in twenty times (in the 
long run). The protection level can be set by the resesrch worker BUT 
the greater the protection demanded (i.e. the smaller the risk) the greater 
the cost (i.e. the longer the series that will have to be collected). 


Step 1. The research worker must decide what protection 
level he wants against the error of recommending the “new” 
treatment when, in fact, it is no better than the “old” treatment. 
The risk of error is specified by a number such as 5%, 10%, ete. 
This number will be called the “Type I level’. 


In order to set up a plan some information is necessary about the 
proportion of cures effected by the standard treatment. This informa- 


tion does not have to be very precise, merely an order of magnitude— 
that is, an intelligent guess. 


Step 2. The research worker guesses the proportion of cures 
by the “old” treatment (which will be called “‘p,’’). 


Now the doctor is called upon to do a little soul-searching. He must 
try to answer the question: ‘“What proportion of cures by some alterna- 
tive treatment would constitute a really important advance?” Perhaps 
the doctor would feel that any advantage, no matter how slight, from 
the new treatment would be of interest, but this is not a useful answer. 
What is desired is a hypothetical proportion of cures such that if the 
“new” treatment did this well, and 7f, due to sampling errors, the doctor 


failed to recommend the ‘“‘new”’ treatment, then the doctor would really 
feel like kicking himself! 
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Step 3. The research worker specifies what proportion of 
cures would be necessary if an alternative treatment were to be 
considered an important medical gain (call this quantity “‘p.’’). 


A second kind of error which might be made would be to fail to 
recommend a treatment which could cure p.% of the patients. The 
experimenter would also want insurance against this type of error. 


Step 4. The research worker must decide what protection 
level he wants against the error of failing to recommend the 
“new” treatment when, in fact, the “new” treatment represents 
an important medical advance. The risk of this type of error 
would also be specified by a number such as 5%, 10%, ete. This 
number will be called the ‘Type II level” to distinguish it from 
the number in Step 1. 


In these four steps the research worker provides the specifications 
for the sequential plan. Ideally the Statistician would then “tailor” a 
sequential chart that would meet these specifications. In practice it is 
not usually expedient to “‘fit”’ the scheme so carefully and the statistician 
would, instead, go through a file of ready-made schemes and select one 
which more or less conformed to the specifications. Notice that four 
numbers are enough to specify the scheme rather completely. 

One or two remarks might clarify these steps in specifying the plan. 

In Step 3 the experimenter really specifies the magnitude of the 
treatment differences that he is after. This situation is quite analogous 
to netting fish. If the fisherman is going after tuna he can use a course 
net (short series) but if he is out for sardines he will have to use a fine net 
(long series). A sardine net is not efficient for tuna, and vice versa. 
In the same way the research worker has to choose the right sequential 
“net”’ for his particular job. 

The research worker who wants to limit his study to a fairly short 
series of patients must realize that, while he may “catch” big advantages 
between treatments, the small advantages are likely to “get away”— 
they will not be detected by his experiment. 

Another thing worth noticing is that there are at least two different 
types of errors that can be made. I am afraid that, until fairly recently, 
the statisticians have concentrated on the first type of error and have 
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rather tended to neglect the other. This second type of error, the 
“missing” of advantageous new treatments, may be just as serious in 
medical experimentation as the error of claiming non-existent advantages 
for a treatment. 

In many statistical techniques the risk of first type of error is fixed 
at the small number, 5%, and, in short series, this sometimes means that 
the risk of the second type of error may be very large—often there is 
better than a fifty-fifty chance that an important advantage will be 
missed altogether! In my opinion the choice of the 5% level may be 
quite unrealistic and there are many experimental situations where it 
would be preferable to have more nearly balanced risks. This might 
mean that the Type I risk would be 10% or even 20% instead of the 
traditional 5%. 


4. Two Sequential Plans 


I have prepared two plans which are designed to serve as examples of 
sequential schemes. I have tried to choose the specifications so that 
these plans would be useful in medical experiments. Both plans are 
designed to detect “‘medium-sized” advantages in the new treatment. 
The main distinction between the two plans is that Plan A fixes the 
Type I level at 10% while Plan B fixes the Type I level at the traditional 
5%. Both plans require fairly long series. 

Plan A is given in Figure 2. You will note that there are THREE 
different barriers in this scheme. Each barrier corresponds to a different 
terminal decision. We can think of these decisions either as verbal 
statements or as actions, and the appropriate terminal decisions are: 


DECISION RULES 


Barrier Statement Action 


Top The new treatment is superior | Use the new treatment 


Middle There is no “important” differ- | Use the treatment which is most 
ence between the treatments convenient 


Bottom | The old treatment is superior Use the old treatment 


As long as the path of the experiment does not cross a barrier the 
action, of course, is to continue the experiment. 
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You will also note that Plan A is ‘‘closed,” that is the barriers close 
in so that eventually the path of the experiment must cross one of the 
barriers. Some sequential plans are “open” and in such plans it would 
be possible for the experiment to continue indefinitely without crossing 
a barrier. Open sequential plans do not seem appropriate for medical 
experimentation for obvious psychological reasons. 

Plan B is given in Figure 3 and is similar to Plan A, in that there are 
three barriers and the plan is ‘“‘closed.”’ 

To use either Plan A or Plan B: 

(1) The patients must be “paired” by assigning them alternately to 
the two treatments (or in some other fashion). 

(2) The outcome of each “little” experiment on a pair of patients 
must be plotted on the chart in accordance with the Plotting Rules given 
in Section 2. 

(3) The conduct of the experiment and the eventual decisions are 
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made in accordance with the Decision Rules given above. 

For reasons which will be discussed later it will generally be desirable 
to keep a separate record of the “‘little’’ experiments which are “ties.” 
A count should be kept of the number of cases where both treatments 
effect cures and also the number of cases where both treatments fail to 
cure. 

Before you would want to use either of the plans you would want to 
know more about the specifications and also something about the length 
of the series involved in these plans. 

To explain the specifications I will have to make a few remarks about 
the way in which the plans were constructed. Since we only consider 
those “‘little’’ experiments where the two treatments lead to different 
outcomes the theoretical structure is very simple. 

Let us see what happens when (a) the proportion of patients cured 
by the “old” treatment is p, and (b) the proportion of patients cured by 
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FIGURE 3—PLAN B 


| 
4 
| 
an 
3 
12 


SEQUENTIAL MEDICAL PLANS 199 


the “new”’ treatment is p, (i.e., there, is an “important” advantage to the 
new treatment). Then the probability of each outcome in the “little’’ 
experiment can be easily calculated: 


Outcome Old New Probability 
Number of outcome 
1 Cured Cured Pipz 
2 Not Cured Not Cured (1 — pi)(1 — po) 
3 Cured Not Cured pi(l — pez) 
4 Not Cured Cured (1 — pi)p2 


Since we only consider outcomes 3 and 4 the proportion of the time in 
which outcome 3 will occur (which I will denote by p*) will be 


pi(l — pro) 
pill — ps) + (1 — pi)pe 


You may verify for yourself that when p, = p, then p* is going to 
be equal to 1/2 (no matter what p, may be). 

On the other hand if, as we assumed earlier, the ‘‘new”’ treatment is 
advantageous (i.e., p2 is greater than p,) then p* will be greater than 1/2. 

Now what I have done is to assume that 7f p, is enough larger than p, 
so that p* = .7, then this will represent an advantage for the “new” 
treatment that we will regard as ‘‘important.’’ To see what this means in 
terms of p, and p, , look at Table I under the heading p* = .7. For 
example if the “old” treatment could cure 25% of the patients, then I am 
specifying that if the “new” treatment can cure 44% of the patients it 
would represent an important medical advance. 

If you were contemplating using the sequential plans given in this 
paper you would enter Table I with your guess as to the proportion (p,) 
cured by the “old” treatment and then you could see whether the value of 
P2 (under heading p* = .7) seemed to be appropriate. If, for example, 
the “old” treatment cured 25% of your patients and if you would not 
want to miss a “new” treatment which cured 30% of the patients, then 
you would have to choose a plan which was based on a longer series than 
the examples given here. 

When we come to discuss the length of the series involved in the 
plans, it should be clearly understood that for any one experiment, the 
length of the series will depend on what happens in the course of that 
experiment. Since the schemes are closed we can talk about the mazi- 
mum size of the series but it should be remembered that only rarely will 
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TABLE I 
SPECIFICATIONS 
The proportion cured by a 
The proportion of “new” treatment which k 
patients cured by the represents: 
“old” treatment (p;) 
An “impor- A “small” If If 
tant” advan- | advantage p* =.7 p* = 6 
tage (p2) 
(p* = .7) (p* = .6) 

.05 ll .07 6.7 8.9 
21 14 3.7 4.7 
15 .29 21 2.8 3.4 
.20 .37 .27 2.4 2.8 
.25 .44 .33 2.1 2.4 
.30 .50 39 2.0 2.2 
.35 .56 .45 1.9 2.1 
-40 61 .50 1.9 2.0 
66 .55 1.9 2.0 
.50 -70 60 2.0 2.0 
.55 .74 65 2.1 2.1 
.60 .78 .69 2.3 2.2 
.65 .74 2.5 2.3 
.70 .84 78 2.7 2.6 
.88 82 3.2 2.9 
.80 .90 .86 3.8 3.5 
.85 .93 .89 5.0 4.4 
.90 95 .93 6.4 
.95 98 .97 14.7 13.0 


the experiment actually require the maximum number. 

In Plan A the longest possible path is 48 squares. Note, however, 
that each “x’’ will represent an observation on two patients. Moreover 
“tied” experiments will not appear on the chart. 


On the average® the maximum length of series (taking into account 
the ties) will be 


average maximum series = 2k (longest path) 
where 


k= 
pill = Po) + 


31f nothing but “tied” experiments occurred the experimentation could continue indefinitely. 
Members of the audience at the Blue Ridge Meeting suggested the simultaneous use of a second 
sequential chart which would plot “tied” against “‘non-tied” experiments. 
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I have calculated “hk” in Table I for various values of p, and p,. The 
longest path in Plan A is 48 squares and in Plan B is 58 squares. 

I will illustrate the determination of the average maximum series by 
supposing p, = 25, p. = 44. From Table I (under the heading p* = .7) 
the value of k is found to be 2.1. Now for plan A in this situation 


Average maximum series (A) 2(2.1)(48) 


expt. while for Plan B: 


202 patients in 


Average maximum series (B) = 2(2.1)(58) 
expt. 


244 patients in 


You will note from Table I that the value of k is around 2 for values 
of p, in the neighborhood of .5. However, if p, is near zero or near one 
then k can get very large. Most experimenters realize that when their 
treatments are rarely successful or nearly always successful it requires a 
very large experiment to distinguish between two treatments. 

Instead of considering maximum series I might instead consider the 
average length of the series or the median lengths. I will use the latter 
value since the median can be interpreted very simply. Half of the 
experiments would require a shorter series and the other half would run 
to a longer series. The median values for the two plans are given in 
Table II. 


TABLE II 
Median length | Median length 
of path for of path for 
Plan A Plan B 

If there is really no difference between 
If the “‘new” treatment has an “important” 


Table II indicates that, unless you are a pessimist, you need not 
think in terms of the “average maximum series.” There is a fifty-fifty 
chance that the experiment will be terminated by the time a series 
roughly half as long as the “average maximum series” has been collected. 
Note that if there really is an “important” difference between the two 
treatments, then the experiments using Plan A will tend to be shorter 
than if, in fact, there is no difference. Plan B does not show this charac- 
teristic. 

If you want to think in terms of this “median length” then the aver- 
age median length of the series will be 
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2k (median length). 


Roughly speaking these particular sequential plans will require, on 
the average, about two thirds as many observations as “fixed sample 
size” plans with the same specifications. 


5. Variations on a Theme 


The performance of Plan A is summarized in Table III. In this table 
the proportion of experiments which wind up in each barrier is given for 
different values of p*. 


TABLE III 
PERCENTAGE FALLING IN BARRIER 
p* Top Middle Bottom 
.50 10.75 78.50 10.75 
.60 49.95 49.21 
65 73.99 25.87 
.70 90.54 9.46 01 


When properly interpreted, Table III tells the story about Plan A. 
If there is no difference between the two treatments (p* = .50), then 
there is about a 10% chance of saying (erroneously) that the ‘‘new”’ treat- 
ment is superior, and also about a 10% chance of saying that the “‘old’”’ 
treatment is superior. A little less than 80% of the time we would be 
led to the correct statement that there is no “important” difference 
between the two treatments. 

If the “new” treatment offers an “important” advantage (p* = .70) 
then 90% of the time we will say that the “new” treatment is superior 
and about 10% of the time we will erroneously state that there is no 
“important” difference between the two treatments. 

It was earlier noted that plans A and B are designed to catch medium 
sided differences between the treatments, and that small differences are 
likely to be missed with these schemes. 

If there is only a small difference between treatments (p* = .60) then 
we will say (correctly) that the ‘“‘new” treatment is superior only about 
half of the time. There is even a very small chance (less than 1%) that 
we will say that the “old” treatment is superior. 

For intermediate differences (p* = .65) Plan A will “pick up” the 
difference about three quarters of the time. 

Table IV gives the performance for Plan B. 
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TABLE IV 
PFRCENTAGE FALLING IN BARRIER 
2 Top Middle Bottom 
.50 4.90 90.20 4.90 
.60 37.12 62.88 .20 
.70 85.95 14.04 .O1 


It will be noted that with Plan B there is a much smaller chance of 
saying that the “new” treatment is superior when (in fact) it is no better 
than the “old” treatment. However, the chance of missing an “impor- 
tant’’ difference is somewhat greater. 

The plans given here may be modified so as to serve more effectively 
in special experimental situations. The plans as they are given in Figures 
2 and 3 are symmetrical and this will be appropriate if the two treatments 
are equally convenient and if we would be interested in demonstrating 
the superiority of the ‘‘old” treatment. 

Many practical situations are not symmetrical. The “‘old’’ treatment 
may be a tried and true therapy and we may want to stick with it unless 
the “new” treatment is clearly superior. In this case if the path goes 
into either the bottom or the middle barrier we will recommend the “‘old” 
treatment. In this event if the path of the experiment should enter the 
region between the middle and the lower barrier we can terminate the 
experiment at once since the path must inevitably lead to recommending 
the “old” treatment. I am assuming that in this situation we are not 
interested in demonstrating that the “‘old’”’ treatment is superior. 

In a few situations the “new” treatment is much more convenient to 
administer than the ‘‘old” treatment and in this event we might want to 
recommend the ‘‘new” treatment if it is as good or better than the old 
one. In this event the experiment can be terminated as soon as the path 
enters the region between the upper and the middle barriers. 

Occasionally an experimenter wants to demonstrate that two treat- 
ments are not appreciably different. It is a very serious mistake to use 
the ordinary tests of significance for this purpose. These tests can only 
demonstrate differences, they cannot be used to demonstrate similarities. 
However, plans A and B can be used for this purpose. When they are 
so used the proper statement to make if the path of the experiment 
enters the middle barrier is: ‘There are no ‘important’ differences 
between the two treatments.” It is still possible, even likely, that small 
differences may exist. 
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If you want to estimate the proportion cured by either treatment 
then the “‘little’’ experiments which are tied must also be considered. 
The estimated proportion of cures for the “new” treatment* is— 


Number of expts. where both cured + Number of expts. where new ‘‘won” 
Total number of ‘‘litile’”’ experiments 


and for the old treatment is 


Number of cases where both cured +- Number of cases where old ‘won” 
Total number of ‘‘little’”’ experiments 


Now that you have had a chance to see how these sequential plans 
operate I hope that you will be tempted to try them in your experimen- 
tation. I have only calculated out a few such plans because I do not 
know if there will be much demand for them—also I do not know what 
specifications would be most useful in medical research. If you think 
this sort of scheme might be useful in your experimentation I would 
appreciate it if you would drop me a card with your comments. If there 
is enough interest, I will try to publish a series of sequential medical 
plans. 

This work has been done under an Office of Naval Research Contract. 
Miss Ann Pfeiffer has performed the computations. 


Appendix 


Plans A and B were constructed by a quasi-binomial process similar 
to the one discussed by Sir R. A. Fisher in this issue of Biometrics. 
However, the barriers were derived by a cut-and-try method using the 
specifications and the further criteria that the junction between the 
central and upper barriers should occur close to a square corresponding 
to p* = 6. 

The “Efficiency” of these plans is therefore somewhat less than the 
maximum theoretical “Efficiency”. In order to calculate the ‘Effi- 
ciency” a rather arbitrary definition must be laid down. My definition 
involved the following considerations: 

(1) A one-tailed test was used. This corresponds to the modification 
of the plans wherein the experiment stops if it enters the region between 
the bottom and middle barriers. 

(2) The “Fixed sample size” test for comparison is the elementary 
normal deviate test. 

(3) There is no advantage from pairing. 

The “Fixed sample size’ was calculated from the formula 


‘The estimate is slightly biased. 
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= Gat 95) (Pigs + 
(pe — pi)” 
where 6, is the normal deviate associated with a probability of a type I 
error equal to a, and 6, is the corresponding normal deviate for a type II 
error equal to 8. 

The average sample size in the sequential situation is 2k (Mean 
Path) where the ‘‘Mean Path” is the average length of path for the 
modified plan if p, = p. (p* = .5). The efficiency depends on 7, as is 
indicated in Table V. : 


TABLE V 
PLAN A PLAN B 
Pi 
Average Average 
Fixed Sample Sequential Fixed Sample Sequential 
Size Sample Size Size Sample Size 
0.10 280 176 322 179 
0.20 176 114 202 115 
0.50 150 95 172 96 
0.90 728 338 842 339 


It should be appreciated that in a particular experiment the sequential 
scheme might require more observations than the ‘‘Fixed sample size”’ 
plan. Such a situation, while not impossible, would be fairly rare (if 
p = .20 and Plan A is used the chances would be about one in ten). 

In view of the expensive nature of clinical experimentation and the 
possible savings in observations indicated in Table V, sequential plans 
would seem to be worthy of a trial in clinical research. 
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THE DESIGN AND INTERPRETATION 
OF CLINICAL EXPERIMENTS WITH DRUGS 
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Tue Biometric Socrery (ENAR)* 
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A CLINICAL COMPARISON OF MODIFIED INSULINS 
I. THE CLINICAL PROBLEM 


J. L. Izzo 


University of Rochester 


I hope that this paper will illustrate some of the difficulties that 
we clinicians encounter in the evaluation of drugs and the need for 
collaboration between the clinical investigator and the biometrician. 
In this instance, we are concerned with a quantitative comparison at the 
clinical level of modified insulins in routine treatment of diabetes mellitus. 

For the benefit of those lacking medical training, I should like to 
define briefly the disease and the problems of therapy with the drug 
-insulin. Diabetes mellitus is characterized by an impairment in the 
ability of the body to utilize sugar secondary to either an absolute or 
relative deficiency in insulin secreted by the pancreas. This leads to 
a continuing hyperglycemia and glycosuria. If uncorrected, a deficiency 
state ensues, followed by a sequence of events that eventually leads to 
ketosis, acidosis, coma and death. Injections of adequate amounts of 
insulin can completely reverse this process and re-establish the normal 
state. However, unlike most drugs, the problem with insulin is concerned 
not only with the administration of adequate quantities but also with 
proper timing. 

*New York, N. Y., April 14, 1952. 
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In the early days of insulin therapy when only solutions of the un- 
modified hormone were available, definite shortcomings were noted. The 
rapid absorption of the hormone from the subcutaneous tissues gave 
prompt, intense but short-acting effect. Multiple injections were needed 
during the day to control the hyperglycemia and glycosuria; in severe 
cases, an injection was required even during the hours of sleep. Wide 
fluctuations in blood sugar level with resultant waves of hypoglycemia 
alternating with hyperglycemia and glycosuria were common. This was 
a far cry from nature’s delicately balanced mechanism which provides a 
steady regulated secretion of insulin from the pancreas in accord with 
the metabolic need of the organism. In the normal individual, the blood 
sugar is maintained within limits of 70-110 milligram-percent fasting 
and usually below 160 milligram-percent following meals. 

These difficulties led to intensive efforts to prolong the action of 
insulin by delaying its absorption. The first successful long-acting 
preparation to gain wide acceptance was protamine zinc insulin intro- 
duced in 1936. By virtue of its slow absorption from the subcutaneous 
depot, injections given once daily (usually before breakfast) provided a 
smoother control of the blood sugar level. Nevertheless, two major 
disadvantages were noted, particularly in so-called “severe” diabetes, 
both due to its slow but uncertain rate of absorption. First, the response 
to repeated doses of the same size was variable, frequently producing 
unpredictable waves of hypo- and/or hyperglycemia. Secondly, the 
use of doses large enough to prevent glycosuria following meals during 
the day not infrequently produced hypoglycemic shock during the night 
period of fasting. Doses small enough to avoid nocturnal hypoglycemia 
tended to permit heavy glycosuria following meals. 

For these reasons, increasing attention has been directed toward 
developing a depot insulin which when given once daily would be more 
ideally suited to meet the metabolic needs of the large majority of 
diabetics. Several such preparations, designed to produce an effect 
intermediate between that of unmodified insulin and standard protamine 
zinc insulin have been proposed. Of these, the most commonly used and 
widely studied have been certain modifications of protamine zinc insulin 
and globin insulin with zinc. Used alone in single daily injections, 
these intermediate insulins have proved superior to standard protamine 
zinc insulin in controlling the blood sugar level. However, some discord 
and confusion has prevailed in regard to their relative clinical timing 
and usefulness in the treatment of ‘‘severe” diabetes. 

In the hope of clarifying some of these controversial issues, five of 
the intermediate insulins, namely, the 2:1 mixture made by mixing two 


! 
ie 
| 
we 
; 


208 BIOMETRICS, SEPTEMBER 1952 


parts of unmodified insulin and one part of protamine zine insulin, type 
NPC-40, type NPH-50, globin insulin with zinc, and standard protamine 
zinc insulin, were compared at the clinical level (1). This study differs 
from previous similar studies in respect to (1) the period of observation 
and collection of data, (2) the basis on which the insulins are compared, 
and (3) the selection and classification of patients for investigation. To 
a large extent, the reported discrepencies in the comparison of long- 
acting insulins can be explained from differences in methods of study, 
analysis of data and selection and classification of patients. 

Nineteen women patients with diabetes mellitus were used in this 
investigation. They were selected specifically to obtain a wide range 
in age, duration of diabetes, insulin requirements, and stability of carbo- 
hydrate metabolism. The classification of the patients on the basis of 
stability is especially important in the later interpretation of results. 

It became obvious early in the investigation that despite meticulous 
control of the external factors, the response in some cases was strikingly 
different from that in others. From these observations, the patients in 
the investigation were separated into two rough clinical groups, irre- 
spective of the amount of exogenous insulin required or the type of 
insulin used. These have been called the relatively stable group and the 
relatively unstable group. No sharp line of demarcation is possible be- 
tween the groups. Nevertheless, most patients could be classified with- 
out difficulty after a sufficient period of observation under controlled 
conditions. 

In the unstable group, control was difficult. ‘‘Control’”’ here implies 
not only complete absence of glycosuria but also maintenance of blood 
sugar levels within the normal range. Under the conditions of the study, 
the typical unstable patient was subject to wide, unpredictable swings 
in blood sugar level; intermittent or irregular glycosuria of variable degree 
was the rule; reactions to insulin tended to be frequent, severe, difficult 
to eliminate and aggravated by attempts to maintain normal blood sugar 
levels; ketosis was prone to develop rapidly; slight excesses or deficiencies 
of insulin were apt to produce marked hypo- or hyperglycemia, respec- 
tively. In contrast, it was possible to approach control in the stable 
patient; variations in blood sugar level were decidedly less; virtually, 
complete absence of glycosuria was possible without the penalty of 
frequent or severe insulin reactions; the response of the blood sugar to 
changes in dose was sluggish; ketosis was not observed. 

Throughout the period of study, the patients lived quietly in the 
hospital under uniform conditions. Each patient was maintained on a 
standardized weighed diet considered to be appropriate for the particular 
individual, breakfast at 8:00 AM, lunch at 12 noon and supper at 5:00 
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PM. Where necessary for the comfort of the patient, meals were sup- 
plemented by in-between mid-afternoon and/or bed-time feedings. 
Each patient was observed from one to six weeks in the hospital 
before the insulin comparisons were begun. For a given patient, the 
over-all variability among blood and urinary sugar levels remained quite 
constant during this period. Each of the five insulins mentioned was 
compared in each patient, except that six patients were not tested with 
protamine zinc insulin. The order in which they were given varied 
from patient to patient. Insulin was always administered hypodermically 
in a single dose before breakfast. The comparisons were based on the 
performance of each insulin over one or more periods of three to ten 
consecutive days (mode of five days). On changing from one insulin to 
another, a period of adjustment (usually one to three days, depending 
on insulin type) was allowed before the comparisons were made. With 
few exceptions, it was impossible to maintain the insulin dose in the indi- 
vidual patient constant throughout the study. Changes in tolerance 
necessitated slight adjustments in dose from time to time. Blood sugars 
were determined four times daily at 7:30 AM, 11:30 AM, 4:30 PM, and 
9:30 PM. An illustration of the blood sugar data is given in Table 1. 


TABLE 1 


BLOOD SUGAR DETERMINATIONS FOR PATIENT NO. 11 DURING PERIOD ON 
NPH-50 INSULIN 


“Group” Date in units | 7:30 AM | 11:30 AM | 4:30 PM | 9:30 PM 


1 10/31 50 134 103 44 147 
11/1 50 163 153 141 85 
2 11/2 48 127 162 148 55 
11/3 48 87 56 50 163 
11/4 48 182 140 122 105 


Twenty-four hour urines were collected and analyzed for sugar in 
four periods: 7:30-11:30 AM, 11:30—4:30 PM, 4:30-9:30 PM, and 9:30- 
7:30 AM. Since the urinary sugar data reflect the same patterns as 
the blood sugars, they will not be discussed. 


REFERENCE 


(1) Izzo, J. L. and Crump, 8. L. A clinical comparison of modified insulins. Journal 
of Clinical Investigation 29: 1514-1527, 1950. 
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Il. THE BIOMETRICAL ASPECT OF THE PROBLEM 


S. Ler Crump 


University of ‘Rochester 


The purpose of this symposium is to discover some general principles 
of clinical investigation which are either illustrated or suggested by 
the studies discussed. Dr. Izzo has indicated three ways in which the 
present study differs quite basically from all other clinical insulin com- 
parisons that we know about. I shall outline my remarks around these 
three points of departure. 

1. The period of observation and collection of data. In previous studies 
insulins have been compared on the basis of blood and urinary sugar 
data collected on isolated single days, frequently on only one day for 
each insulin type tested. The manner of choosing the days to be included 
is not always clear, but usually depended subjectively upon fasting blood 
sugar levels. In the present study, data collected on every day during 
the period of observation were utilized in the comparisons, with the 
exception of the one to three day adjustment period following a change 
of insulin type. Each patient received each insulin (with the exception 
noted by Dr. Izzo) for at least one period of not less than five consecutive 
days. The subjective choice of single days for the collection of data out 
of the total period of observation is open to serious theoretical objections 
which need not be emphasized here. 

2. The bases for comparing the insulins. The basis for comparing 
insulins in previous studies, though varying somewhat from one study to 
another, is essentially equivalent to comparing the mean blood sugar 
levels when the insulins tested are administered in the same dose. The 
implication in such a comparison seems to be that that insulin is best 
which maintains the blood sugar level lowest during the 24-hour period 
following the injection of a specified dose (in units). Such a comparison 
does not seem to us realistic from the patient’s viewpoint. Within 
reasonable limits, the mean blood sugar level during the 24-hour period 
following injection may be maintained at nearly any desired level by a 
suitable dose of any of the long-acting insulins. With the possible 
exception of a slight economic advantage for some one of the insulins, 
the amount of insulin required to maintain a specified mean blood sugar 
level is of little or no consequence to the patient. 

The most prominent feature of the blood sugar level of diabetics 
receiving insulin is not its elevation but its high variability relative to 
that in non-diabetics. The problem of controlling diabetes, as Dr. Izzo 
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has indicated, is two-sided. The blood sugar level must be maintained 
both below an upper limit and above a lower limit as much of the time as 
possible. There is disagreement as to what are acceptable limits but 
however chosen, they differ from one patient to another, so that the basic 
problem is largely independent of the choice of limits. The greater the 
variability in blood sugar levels, the greater the chances that one of the 
limits will be passed some of the time, and conversely. 

In the present study, the insulins were compared on the basis of 
the variability in the blood sugar levels. There are two components of 
the variability which might be influenced differentially by the different 
insulin types. The first of these is the variability in level during the 
24-hour period between injections, intra-daily variation. Although it is . 
possible to take action between injections to reduce or elevate the blood 
sugar as necessary, such action is often inconvenient and in any event 
defeats the purpose of a 24-hour insulin. On the whole, the insulin will 
be preferred which produces minimum intra-daily variation. 

In addition to the intra-daily variation, the mean daily blood sugar 

level varies from one day to the next even when the patient is receiving 
the same daily dose of a single insulin type. An insulin which controlled 
intra-daily variation very well might still be quite unsatisfactory if the 
mean daily levels varied excessively. We have called this variability 
inter-daily variation. 
_ The measures of intra- and inter-daily variation used in this study 
are defined exactly elsewhere (1). It is sufficient to note here that they 
are proportional to the logarithms of analysis of variance mean squares. 
When the insulins are compared, patient by patient, on the basis of 
intra- and inter-variability, no consistent differences are found. The 
magnitude of these variabilities seem peculiar to the patient and insensi- 
tive to the insulin type used. 

On our scale of measure, there is no sharp dividing line between 
the values for non-diabetics and diabetics. Intra-daily variation ranges 
from about 2.00 in non-diabetics up to a maximum of 5.00 in the patients 
studied. Inter-daily variation ranges from about 2.50 in non-diabetics 
to 5.50 in this study. From one patient to another, the magnitude of 
the variability covers a wide range, beginning at levels approaching those 
of non-diabetics. In the beginning, the patients had been arranged in 
descending order of an ill-defined clinical concept of ‘stability’. When 
they are ranked on the basis of intra- and/or inter-daily variability, 
the agreement with the rank by clinical impression is very good. We 
have called the first seven patients in these rankings stable and the last 
12 unstable (1). The dividing point is arbitrary and our conclusions 
would be unaltered by moving the dividing point up or down a little. 
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A measure of just the general magnitude of intra-daily variation in 
blood sugar levels will not reflect insulin differences of potential im- 
portance. The levels do not vary randomly about the daily mean. Our 
measure of intra-daily variation would not reflect a difference, for ex- 
ample, in the time of day at which high or low levels occur. We have 
attempted to describe the pattern of the intra-daily variation by plotting 
the mean deviation from the daily mean at the four specified times for 
each patient on each insulin. For example, the mean deviation at 7:30 
AM is the average of the deviations of the 7:30 blood sugar values from 
their corresponding daily means, averaged over all days on which the 
patient received a specified insulin. Some of the graphs obtained are 
illustrated by figures 1-3. 

In considering the pattern of intra-daily variability, the factor of 
stability is all important. In the seven stable patients, defined as 
above, the intra-daily variability is distributed more or less uniformly 
over the four times of day at which the blood sugar was determined. 
Moreover, this pattern is the same for each of the five insulins tested. 
In the 12 unstable patients, the intra-daily variability shows striking 
pattern differences from one insulin to another. The patterns for 
protamine zinc insulin (P.Z.I.) and for globin insulin are distinctly 
different from one another as well as from those for the three modifica- 
tions of P.Z.I. These patterns are shown in figures 1-3. The graphs 
for the 2:1 mixture and for NPC 40 insulin are not shown since they are 
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GLOBIN INSULIN WITH ZINC 
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essentially the same as those for NPH 50 insulin. Statistical analysis 
verifies that the apparent pattern differences among the insulins in the 
unstable patients could scarcely arise by chance. In summary, the only 
insulin differences that we can detect are in the pattern of the variability, 
and those only in unstable diabetics. 
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3. The selection of subjects for investigation. The selection of patients 
observed in other studies fall roughly into one of three classes: (a) a more 
or less random selection of all diabetics, (b) a selection of only those 
diabetics whose blood sugar levels could be maintained within a rather 
narrow zone around the normal level under conditions of the study, 
and (c) a selection of “severe” diabetics using the size of the insulin — 
dose required to control hyperglycemia and glycosuria as the sole criteria 
of severity. In the present study, the patients were selected not at 
random but on the basis of clinical impression and preliminary observa- 
tion to cover the whole range of what we call stability. The group which 
we call unstable is represented by 12 of the 19 patients studied, but 
probably comprises less than 25 percent of the total diabetic population. 
Selection of type (b) above would exclude all but four or five of the 
patients in our study. “Severity” as measured by insulin dose seems 
to bear no relation to our concept of stability. 

We feel from physicians’ comments that the satisfactoriness of the 
control of a patient reflects what we have attempted to measure, though 
perhaps unconsciously. When groups are selected by method (a), the 
differences between insulins shown in a few patients are liable to be 
obscured by the many. When selection is of type (b), there will be 
no detectable insulin differences. Selection of type (c) is nearly equiva- 
lent to random selection in respect to stability. 

This study seems to point up two general recommendations in the 
conduct of clinical drug comparisons: 

1. Care must be taken to insure that the comparisons are made on 
bases which really reflect the control of the condition in the patient 
which the drugs presume to treat. 

2. Subgroups of the diseased population under study may react 
differently than the main group. These must not be overlooked, even 
though relatively infrequent. 

Before closing these remarks, it is pertinent to note a few ways in 
which the plan of the investigation would be altered if it were carried 
out again. An element of subjectivity was left in the choice of the length 
of the adjustment period following a change in insulin type. Since there 
was no strong reason for varying the length of this period from one patient 
to another, or from one insulin to another, it would be more satisfactory 
on general grounds to pick a single length of adjustment period and use 
it throughout the investigation. As the study was conducted, the order 
of administering the different insulins varied haphazardly from one 
patient to another. This makes it impossible (or at least very incon- 
venient) to assess or adjust for possible carry-over effects which are 
different for the different insulins. In any future investigation, the 
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order of insulin administration should be planned to make both the 
assessment and adjustment of these effects convenient. Finally, the 
variation in the numbers of days on which the different patients received 
the different insulins makes the analysis of the data rather tedious and 
has no compensating advantage. There seems to be no reason why a 
five-day period, say, could not be used throughout. 
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DISCUSSION 


Alexander Marble* Drs. Izzo and Crump deserve congratulations on 
the detailed and careful study which they have made and on their 
attempt to apply mathematical treatment to the data obtained. From 
personal experience in somewhat comparable studies, I appreciate fully 
the difficulties and pitfalls in this type of investigation. A human being 
is a complex and often unpredictable subject, particularly if he has 
unstable diabetes. 

Fortunately, most patients with diabetes have the condition in a 
relatively stable form. However, even in these, many variables can 
alter the response to insulin. Among these are the composition of the 
diet and its distribution during the day, the amount of physical activity, 
the presence of infections or of other complications such as recognizable 
pituitary, thyroid, adrenal or liver disease, and other factors, less well 
defined, which alter the sensitivity to insulin. Unfortunately, it is these 
less definite and poorly understood influences which account for the 
unstable character of the diabetes in certain patients whose number has 
been estimated by Dr. Crump to comprise less than 25 per cent of the 
diabetic population. Most children and adolescents, many young adults 
and a few older patients make up this unstable group. In general, 
such patients are not overweight. 

The person with unstable diabetes has a sensitive and labile blood 
sugar level which rises easily and sharply after the ingestion of carbo- 
hydrate and falls sharply after the giving of rapidly-acting insulin. 
Moreover, the blood sugar is subject to shifts which are often unpre- 
dictable. There is an easy tendency to acidosis. Recent studies suggest 
that the blood and pancreas of these patients contain very little insulin 
as compared with that of older, obese patients with stable diabetes. 

All insulins are good, and with most diabetic patients any type may 
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be used successfully provided adjustments are made in the amount, type 
and distribution of food. With those in the unstable group, fortunately, 
the minority, difficulty is often encountered with any type of insulin if 
one strives to attain constantly a sugar-free urine and a blood sugar 
which approaches normal levels. However, even in this group, patience 
and perseverance will almost always be rewarded with a creditable result. 

Clinical experience suggests that the duration of action of insulins 
is about as follows: regular or crystalline, 5 to 7 hours; globm, 18 to 24 
hours; NPH 24 to 30 hours; protamine zinc, over 36 hours. Keeping in 
mind that the longer an insulin lasts, the less active it will be at any 
one time, it is obvious that NPH insulin has a most advantageous length 
of action. One wants an insulin which can be given once daily before 
breakfast and this requirement is filled by NPH insulin in almost all 
patients. In a very few, however, it appears not to carry through suffi- 
ciently to provide a satisfactory fasting blood sugar 24 hours after 
injection. If one increases the dose in order to achieve such, severe 
hypoglycemia occurs in the afternoon and evening despite any reasonable 
redistribution of food. This unfavorable result is seen in my experience 
much more often with globin than with NPH insulin because of the 
apparent shorter action of the globin variety. 

With a significant number of patients, NPH insulin must be supple- 
mented with the regular or crystalline type in order to increase rapidity 
of action. Such insulin may be added to the NPH variety in the syringe. 
with preservation of most of the rapid action of the added insulin. This 
cannot be done if protamine zinc insulin is used as the long-lasting type 
because of the excess of protamine present. The two types must either 
be given by separate injection or a so-called “insulin mixture’ prepared 
in which the amount of regular insulin exceeds that of protamine zinc, 
usually in the proportion of about 2:1. 

Enough has been said regarding practical clinical problems to indicate 
that in the final testing of any insulin resort must be had to the patient, 
taking into account the age, body build, activity, diet and other factors 
which make the problem of one patient different from that of others. 
The results of the studies of Drs. Izzo and Crump emphasize the necessity 
for careful study and for recognition of differing types of patients before 
drawing conclusions as to the action and effectiveness of a given type 
of insulin. 

Paul Meier* In terms of the three categories which the authors have 
used, there remains little for me to discuss. The long and careful 
period of observation and general structure of the experiment seem 
appropriate to the end in view; the basis of comparison is related to the 
factor of interest, namely, variability; and I agree with the general 
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principle of segregating the different types of patients. What I would 
like to do is to discuss briefly some of the assumptions which have to 
be made in analyzing data of this kind. 

The choice of a statistical model for this situation is by no means 
simple. Not only do individuals differ from each other in both pattern 
and variability, but their behavior involves long term trends, as indicated 
by the need for changing the dosages during the course of the study. 
As our knowledge is insufficient to determine a general model for the 
experimental material, we must be satisfied with a less complete descrip- 
tion. By restricting ourselves to small intervals of time and sub-classes 
of patients, about which we are willing to make some simple assumptions, 
it may be possible to derive some useful information. 

In the present study, each patient is treated separately. The day to 
day variations are taken as approximately normaliy distributed about a 
level which is constant within a “group” of days. The variance is 
assumed constant for all “groups” on the same insulin. Similar assump- 
tions are made with respect to the within day variations. On this basis, 
the logarithms of the mean squares provide approximately normally 
distributed quantities which can be used in tests of significance. 

The assumptions relative to variability bet ween days appear reason- 
able enough and the results do not indicate any substantial differences 
between the several insulins. The testing of differences in within day 
variability presents a more delicate question. Since there appears to be 
a daily pattern, the distribution of the intra-daily variation estimate is 
unknown. Insofar as the test is expected to detect any difference in the 
effects of the several types of insulins, it is probably conservative, that is, 
the observed differences may be more significant than indicated. 

Alternative models could be used. For example, a simple functional 
form could be assigned to the within day pattern with parameters to be 
estimated from the data. If a linear form seemed reasonable, an indi- 
vidual’s average slope on a given insulin would describe his daily pattern 
on that insulin. An additional parameter is the residual variability 
after taking out the regression on the slope. This quantity could be 
viewed as a measure of the individual’s instability. 

Such an analysis is subject to limitations similar to those mentioned 
above. Indeed, in the absence of detailed knowledge of the biological 
mechanisms, the same will be true of any model. Consequently, a 
refined statistical analysis is scarcely warranted. The most important 
feature of this investigation lies in the continuous observation of the 
patterns of all types of patients on the several insulins under essentially 
identical conditions. This permits a general comparison of patterns on 
the different insulins which is relatively free from bias. 
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CLINICAL STUDIES OF ANALGESIC DRUGS* 


I. EXPERIMENTAL PHARMACOLOGY AND MEASUREMENT OF THE 
SUBJECTIVE RESPONSE 


Henry K. BEECHER 


Massachusetis General Hospital 


ABSTRACT* 


The principles and practices that have been established are few 
and in several instances they may seem most obvious to the casual 
observer. That they have not been obvious to the majority of individuals 
working with subjective responses can be demonstrated very easily by 
examining reports of investigations in this field. In summary, here are 
the principles and beliefs believed to be unquestionable essentials for 
most work of this kind. 

1. Subjective responses are the resultant of the action of the original 
stimulus and the psychic modification of that stimulus. 

2. Man is the essential experimental subject for definitive answer 
to questions in this field and men are easier to work with than women, 
for with men the controls are simpler. 

3. The investigating staff is constant during any given series of 
experiments. 

4, The “unknowns” technique is employed throughout. The agents 
tested and the time they are tested are unknown not only to the subjects 
but to the observers as well. This requires the use of placebos, also as 
unknowns. 

5. When anew agent is to be compared with agents of past experience, 
and this is nearly always the case, a standard of reference is required. 
Morphine in standard dosage, for example, serves as a standard for 
analgesia. 


*Presented at the symposium held jointly by the American Society for Pharmacology and Experi- 
mental Therapeutics and The Biometric Society (ENAR), New York, April 14, 1952. 
**Complete paper in SCIENCE, vol. 116, 157-162, 1952. 


218 


. 
f 
i} 
be 
% 


ANALGESIC DRUGS 219 


6. Randomization of the new agent, the placebo, and the standard of 
reference is essential. 

7. Significant comparisons of the side actions of agents can be made 
only with doses of equal strength in terms of their primary therapeutic 
effect. 

8. Mathematical validation of supposed difference in the effectiveness 
of two agents is necessary. 

9. The subjective effect of drugs can be quantified accurately only 
when the placebo reactors are screened out. 

Regarding matters for further study, the following unproven postu- 
lates can be indicated by questions. There will be partisans for and 
against each but a good deal of evidence, as yet not conclusive evidence, 
can be martialled in regard to each question. 

1. Can the intensity of any of the subjective responses be satisfac- 
torily quantified? If it can be, which factors predominate in influencing 
intensity, the original stimulus, the reaction to it (psychic modification), 
or both? 

2. Can one generalize that the maximum subjective effects are pro- 
duced rather early by effective agents and that no real increase in effect 
is produced by increased dosage? For example, morphine produces 
nearly its maximum pain relieving effect at about eight milligrams per 
70 kilograms of body weight. The dose-effect curve breaks sharply at 
this point. Larger doses will at great risk produce anesthesia and 
unconsciousness but these effects are outside those of analgesia. We 
are checking this for cough, for sleep, and for euphoria. . 

3. What are the limits of usefulness of animals in the further study 
of agents designed to modify subjective responses, apart from screening 
for organic toxicity? 

4. What is the place of subjective responses that are produced experi- 
mentally as opposed to those that arise in pathology? As seems likely 
from the study of pain, we must determine if the subjective responses 
arising in disease are mandatory for all such studies which deal with 
subjective response. We do not yet know how inclusive this requirement 
is. 

In conclusion, we have shown what conditions are necessary for 
@ proper evaluation of a number of drugs whose therapeutic effects are 
subjective. We agree that they are complex, that they are exasperat- 
ingly time consuming, and we wish it were not so annoying as it is to 
fulfill the necessary conditions. Tedious as these conditions are, we 
insist that they are not more costly than the empirical method, actually 
far less so. They lead to results far more accurately and more rapidly 
than the old-fashioned method of distributing drugs to practically every- 
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body and gradually by trial and error arriving in decades or centuries 
at an approximation of the truth. To take an example, after all the 
centuries that morphine or opium has been in ‘‘common sense’’ use, 
this country has arrived at a dose that is twice as large, 15 milligrams, 
as one that gives essentially the maximum pain relief, eight milligrams. 
In the ‘common sense” method, the cost of the evaluation is borne not 
by the manufacturer but by the public. In the case of morphine or 
opium, the ‘‘correct”’ result was approximated in hundreds of years and 
the conclusion is only about 100 percent off. We can and should do 
better than that. 


II. SOME STATISTICAL PROBLEMS IN MEASURING 
THE SUBJECTIVE RESPONSE TO DRUGS 


FREDERICK MOSTELLER 


Harvard University 


In this paper two easily stated problems that arise in measuring 
subjective response to drugs are discussed. The first problem, concerning 
tests for changes in 2 x 2 tables, has sometimes been mishandled in the 
past, and is still not too well recognized. The second problem concerning 
the existence of placebo reactors and their possible effects on the evalua- 
tion and standardization of drugs, may or may not turn out to be of 
importance, but it clearly has not yet been adequately investigated. Both 
problems are illustrated in the comparative testing of two analgesics. 

Testing for Changes in 2 x 2 Tables. Suppose a number of experi- 
mental patients has been treated on two different occasions with Drug A 
and Drug B. In each case, we note which patients had nausea following 
each dose, leading to a record of one of the following four types for each 
patient. These are the four possibilities that can occur: 


Patient Drug A Drug B 
Jones, A. B. Nausea Nausea 
Smith, C. D. Nausea _— 
Johnson, E. F. — Nausea 


Williams, G. H. 
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With this information in hand, we find two questions commonly being 
asked. ‘The first concerns the frequency of occurrence of nausea with 
either or both drugs. This is a usual question that classical binomial 
methods are suitable for solving and does not need to be pursued further 
here. The second common question is whether Drug A is more likely 
to produce nausea than Drug B, and that is the question that is often 
mishandled. 

The first way of mishandling the data is to set up a table like Table 1 
(labelled ““Mishandling No. 1’’). 


TABLE 1 
MISHANDLING NO. 1 
Nausea Not-Nausea Total 
Drug A 18 82 100 
Drug B 10 90 100 
Total 28 172 200 


The same 100 subjects have all received both Drug A and Drug B. 
With Drug A 18 cases show nausea, and with Drug B 10 cases show 
nausea. The standard chi-square method (or ¢-test) might then be 
applied incorrectly to the difference between proportions in the lines 
for Drugs A and B, even though we have matched cases. The principal 
mistake here is that we are really interested in the patients, not in the 
fact that we have 200 doses. We suspect beforehand that persons 
experiencing nausea after Drug A may be likely to experience nausea 
after Drug B and vice versa. This expected correlation of outcome is 
one reason for using the matched doses on the same patients. What we 
need is a reclassification of the data using patients as the unit. This 
reclassification leads to the appropriate table—Table 2. 

In the appropriate table, we have a breakdown of Drug A into nau- 
sea and not-nausea, giving us four kinds of cases. Nine cases had nausea 
with both drugs; 81 cases never had nausea. Drug A produced nausea in 
nine cases when Drug B did not, and there is one case with nausea 
from Drug B but no nausea from Drug A. Our primary interest is 
whether Drug A produces nausea more often on the average than Drug B. 
For this purpose, we are not interested in the whole table but only in the 
two terms of the diagonal labelled EZ and D. (This results from the fact 
that we want to test whether C + D = C + E for population rather than 
sample values.) Here there are only 10 cases that give us information as 
to whether the incidence of nausea differs from Drug A to Drug B. 
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TABLE 2 
APPROPRIATE TABLE 


Drug A 
Nausea Not-Nausea Total 
Nausea 9 (C) 1 (D) 10 
Not-Nausea 9 (BE) 81 (F) 90 
Total 18: 82 100 


{= = = —— = 2.22 st. dev. 
VE+D 3.16 


P(t > 2.22) = 0.013 


The fact that some subjects had nausea both times and others had 
no nausea on both occasions is good information for other purposes, 
but not for testing whether Drug A differs from Drug B with respect to 
nausea incidence. This is determined from the frequency of cases in 
cells EF and D of Table 2 by the standard ¢-test, shown immediately 
below Table 2. The difference between D and E, neglecting the sign, 
is diminished by one and divided by the square root of the sum of D 
and E to obtain 2.22 standard deviations. The subtraction of unity is 
a correction for continuity equivalent to Yates’ correction of one-half 
in the more usual chi-square test. We decide, therefore, that Drug A 
is really worse than Drug B with respect to producing nausea. 

The test is equivalent to testing whether p = 1/2 on the basis of 
the outcome of E + D binomial trials. It is possible therefore to use 
exact rather than approximate methods. The probability of getting 
9 or more heads in a sample of 10 coin flips is (1 + 10)/2'° = 11/1024 = 
0.011, which agrees closely, of course, with the probability associated with 
a t-value equal to or greater than +2.22 which was 0.013. This general 
technique for dealing with changes in a 2 x 2 table was not developed 
by the author of this paper. A good many people seem to have developed 
it independently. However, the earliest published source known to the 
author is McNemar (7). The author has, however, applied Tocher’s 
approach (8) for obtaining the best Neyman-Pearson test for changes in 
such 2 x 2 tables. It turns out that the exact binomial test mentioned 
above is most powerful, except for randomization. The randomization 
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seems to the author to be necessary to take care of the discreteness 
of possible significance levels for such discrete problems, and therefore 
the randomization resembles a rounding procedure. So it seems fair to 
say that the exact binomial is the most powerful test for changes, except 
for rounding. (Readers interested in further applications will want to 
consult Denton & Beecher (3), and also Cochran (2) for a discussion of 
many related and more complicated problems. It probably should be 
mentioned for practitioners’ benefit that Cochran’s description of the 
test for correlated proportions in the 2 x 2 table does not contain the 
correction term, nor does McNemar’s paper. The correction was de- 
scribed by Edwards (4) and also suggested to the author by Wallis 
(personal communication) after the publication of the formula without 
the correction term in the Question and Answer column of the American 
Statistician (5).) 

Having indicated a good method for analyzing the appropriate table, 
I would like to mention another type of mishandling. The other mistake 
is to set up the appropriate table, and then to compute the chi-square 
test for association. However, we are not here interested in the fact that 
the coincidence of no nausea or of nausea may be high with both drugs. 
This coincidence merely indicates that these subjects were susceptible 
to neither drug or to both drugs. With similar drugs we anticipate 
similarities of response. In this comparison, we are interested in points 
of difference in response, not in commonness of response. 

Existence of Placebo Reactors. The second problem is that of placebo 
relief. Some experimental work on chronic headaches by Jellinek (6) 
indicates the nature of the problem. Persons with chronic headaches 
were given various drugs, including placebos, and they reported relief 
or no relief after each administration. Two of the drugs gave about the 
same overall percentage relief when tested on the whole group. The 
drugs were identical except for the absence of one component from one 
of them. 

In five treatments with placebos, a large fraction of patients was 
never relieved, while another large fraction was relieved three or more 
times. The distribution was distinctly U-shaped. Such a result suggests 
a differential response to placebos, and though such a response might be 
graded, as a first approximation we could divide the patients into two 
groups and name these placebo reactors and placebo non-reactors for 
convenience. When the placebo reactors were removed from the data 
before comparing the two drugs in question, Jellinek found the difference 
in relief from the drugs increased so much that the presence or absence 
of the special component seemed clearly to have an important effect 
for placebo non-reactors, and contradicted the initial conclusion derived 
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from considering data for the whole group. If there are at least two 
classes of persons distinguished by their response to placebo, we might 
conceive that placebo reactors respond in large part to drug administra- 
tion per se, while the placebo non-reactors might discriminate better 
between drugs and dosages. It would seem reasonable to adjust dosages 
and evaluate drugs on the basis of those patients who can discriminate 
unless other considerations rule against such a procedure. The placebo 
reactors, if they exist, might change the slope of the dosage-response curve 
and in consequence the sensitivity of the experiment. Or the placebo 
reactors by their high relief rate may mask gains from a drug to the 
non-reactors. By separating placebo reactors from the non-reactors, 
experiments might be made more efficient in terms of reducing the 
number of observations. 

As far as the author knows, the differential rate of relief from drugs 
for reactors vs. non-reactors has only been reported in connection with 
Jellinek’s chronic headache patients. Whether such an effect can be 
observed in other situations is an open question that may take consider- 
able research to answer. Chronic headache patients are a rather specila 
group, of course. The author hopes the reader will not confuse the dis- 
cussion of differential placebo effects with the standard fact that placebos 
produce relief. This is well known, and the magnitude of this “placebo 
effect” is often reported in experimental work. 

There is a little additional information on the placebo-reactor ques- 
tion. Recently, Beecher (1) obtained some data on the oral adminis- 
tration of drugs with post-operative patients that can be used to illustrate 
the differential response issue. In post-operative patients, the experi- 
mental difficulties are much greater than is the case when using persons 
subject to some kind of chronic discomfort. In the hospital patient, 
wound pain decreases rapidly with time, so that we rarely have more than 
a few comparable doses to provide discrimination between the two kinds 
of reactors if they exist. Table 3 shows that for 131 patients given 
aspirin, 52 per cent were relieved (+) and for 116 given morphine or 
codeine, 40 per cent were relieved. The results on morphine and codeine 
were very similar so that they have been combined. 

The advantage of aspirin here is explained by the oral administration. 
These patients had also been given placebos, so that the data could be 
further subdivided as in Table 4. 

Here we might ask whether aspirin is better than placebo and whether 
morphine-codeine are better than placebo. From the diagonal numbers, 
42 and 19, we find that aspirin is better than placebo (P = 0.0024). 
Although morphine-codeine were better than the placebo, the difference 
in the diagonals 24 and 16 is less, and the significance less emphatic 
(P = 0.134). 
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TABLE 3 
Relieved Not Relieved Total 
+ 
Aspirin 68 (52%) 63 131 
Morphine and Codeine 46 (40%) 70 116 
Total 114 133 247 
P = 0.06 
TABLE 4 
Grand Sum 
Grand Sum Aspirin Morphine or Codeine 
+ Total + Total 
nie 26 19 45 + 22 16 38 
Placebo Placebo 
- 42 44 86 _ 24 54 78 
Total 68 63 131 Total 46 70 116 
P(42 or more given 61) = 0.0024 P(24 or more given 40) = 0.134 


Let us carry this analysis one step further as in Table 5. The 


_ patients relieved with the placebo at least once have been collected into 
_ a group at the left and those not relieved by placebo are found in a 


separate group at the right. We now find that those relieved by placebo 


_ had the same percentage relief from aspirin (57.8%) as for morphine 


and codeine (57.9%). (Of course, we do not expect such close agreement 
in samples of this size to be observed very often.) Of those not relieved 
by the placebo, nearly 50 per cent were relieved by aspirin and only about 
30 per cent by morphine or codeine (P = 0.028). The results suggest 
strongly that as a group, placebo reactors do not differentiate well 
between doses of aspirin and morphine or codeine at the dosages used, 
but that the placebo non-reactors can differentiate as a group. Further- 
more, it is interesting that for all drugs the placebo reactors have a 
higher percentage of relief than the placebo non-reactors. 

Since there was only one observation on the effectiveness of the 
Placebo in each patient, we suppose that there are additional placebo 
reactors in the group labelled ‘not relieved by placebo’. 
+s) "This is just the beginning of the kind of data needed on this problem. 
It indicates rather clearly that the persons who reacted to the placebo 
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TABLE 5 
Relieved by Placebo Not Relieved by Placebo 
+ — | Total + — | Totaly 
Aspirin 26 (57.8%) 19 45 42 (48.8%) 44 86 | 
Morphine or Codeine 22 (57.9%) 16 38 24 (30.8%) 54 73 & 
Total 48 35 83 66 98 | 164 | 


P = 0.028 


were not able to discriminate between aspirin and the morphine-codeine 
dosages, whereas those who were not relieved by placebo could s0 


discriminate. There is a further problem. There may not be a simple) 
dichotomy between placebo reactors and placebo non-reactors, but rather) 
varying grades of response. 

Finally, we have some preliminary data not reported here for a fev) 


cases on placebos and morphine administered by injection rather than) 
by the oral route. So far there is no evidence of differential effect for! 


placebo reactors and non-reactors for these preliminary data. 


It should be noted that this paper is primarily methodological in) 
intent. The reported substantive results are very preliminary and] 
intended as illustration. Even so, they would not be reported in print] 
at this time except for the strong urging of C. I. Bliss, who was responsible! 


for obtaining written records of this Symposium. 


REFERENCES 


(1) Beecher, H. K., personal communication. 
(2) Cochran, W. G., “The comparison of percentages in matched samples”, Bio 
metrika, Vol. 37 (1950) pp. 256-266. 


(8) Denton, J. E., and Beecher, H. K., ‘“New analgesics”, Jour. Amer. Med. Assn., ; 


Vol. 141 (1949) 1051 et seq., 1146 et seq., 1148 et seq. 


(4) Edwards, A. L., “Note on the ‘correction for continuity’ in testing the significance) 


of the dillerence between correlated proportions”, Psychometrika, Vol. 13 (1948) | 
pp. 185-187. 

(5) ‘Equality of margins”, Questions and Answers, American Statistician, Vol. 1, 
No. 2 (October 1947) p. 12. 

(6) Jellinek, E. M., ‘Clinical tests on comparative effectiveness of analgesic drugs”, 
Biometrics Bulletin, Vol. 2 (1946) pp. 87-91. 

(7) McNemar, Q., ‘Note on the sampling error of the difference between correlated 
proportions or percentages”, Psychometrika, Vol. 12 (1947) pp. 153-157. 

(8) Tocher, K. D., “Extension of the Neyman-Pearson theory of tests to discon- 
tinuous variates’, Biometrika, Vol. 37 (1950) pp. 130-144. 


AN 


| fiel 
‘ — 
| (a) 
; 
~ 
int 
in 
| of 
mi 
3 
| 
i 
“ 
vol 
dc 
ne 
Ir 
cl 
3 
| 
ar 
1 
be 
pl 
ec 
| 
| | dl 


ANALGESIC DRUGS 227 


DISCUSSION 


Abraham Wikler’ The work of Dr. Beecher and his colleagues in the 


| field of pain and analgesia, which has been summarized tonight in such 


a lucid manner, emphasizes once again the necessity for keeping in view 


(a) the purpose of research in this field; (b) what it is that we are 
' studying; and (c) the general principles which must govern all scientific 
- investigations. These three aspects of Dr. Beecher’s presentation are 
interrelated, but they may be considered separately for convenience 


in discussion. 


There can be no quarrel with the thesis that one important purpose 


: of research in the field of pain and analgesia is the development of 
- technics which will enable any scientifically trained investigator to pre- 


dict the relative efficacy of new drugs for relieving “natural” pain in 


- man. However, this statement of the purpose of research in the field 


' of pain and analgesia is not quite complete. We are looking for “good” 


analgesics—those which relieve pain in a variety of clinical conditions 
in such doses as do not impair other important functions to a significant 
degree. In other words, we are searching for drugs which have a certain 
“pattern” of effects on patients with pain. In this sense, morphine is a 
“good” analgesic, for it exerts such a pattern of effects in a wide variety 


_ of clinical conditions. Barbiturates are probably not as “good”, for 
_ equal degrees of pain relief can be achieved with these agents only in 


doses that impair other functions of importance to the patient—wakeful- 
ness, ability to concentrate, judgment and neuromuscular coordination. 


- Ina hospital, such side effects may be unimportant. In the out-patient 
clinic, or in office practice, they must be taken into account. Acetyl- 
- salicylic acid may be classed as a “good” analgesic for some varieties 
_ of clinical pain, and as a “fair” or “poor” analgesic for others. Placebos 


Bio- 


ssn.) 
- problem and they also suggest that no one method of measuring ‘‘anal- 


_ gesia’’ is likely to serve all purposes equally well. 
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' are generally “poor” analgesics, but in particular conditions, they may 


be “good” ones. These considerations indicate the magnitude of the 


A consideration of what it is that we are measuring leads to similar 
conclusions. Some ten years of experience in measuring the effects of 
drugs on man, animals and animal preparations has convinced me that 
what we are measuring can be described only in operational terms. 
The effects of drugs cannot be described in terms of generalized abstrac- 
tions concerning what is “there” in the organism, but only in terms of 
changes in instruments of observation which occur under conditions that 


1From the National Institutes of Health, NIMH Addiction Research Center, Public Health Service 
Hospital, Lexington, Kentucky. 
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must be defined precisely. Therefore, in describing our results, we must) 
avoid, wherever possible, the use of such generalized abstractions as) 
subjective, objective, pain, analgesia, perception, reaction, psychic, and 
somatic. 

Although Dr. Beecher uses some of these terms, what he means by 
them is implicit from the context. Pain and its relief are defined in” 
terms of verbal reports and certain non-verbal aspects of behavior in’ 
subjects who are recovering from conditions with a good prognosis, 
in a hospital with an excellent reputation under the guidance of eminent 
clinicians whose only interest is known by the patient to be his own) 
welfare. As Dr. Beecher has noted elsewhere, the patient’s verbal and” 
non-verbal behavior may be at variance with each other and, conse- 
quently, the observer must choose between them or weight one or the 
other in some arbitrary way. Hence the observer is not excluded from) 
the measurements. One should not expect that the data so acquired! 
would be perfectly commensurable with data obtained in studies of) 
patients with pain due to cancer, for example, who know they are suffer-| 
ing from a disease with a grave prognosis, in a hospital which is famous 
for the excellence of its pathologic museum. Nevertheless, while Dr.| 
Beecher’s method enables one to define and measure analgesia in a| 
particular way, it is a way which is very useful for many clinical purposes. | 

Dr. Beecher’s criticism of the current use of ‘experimental’ pain | 
in the study of analgesia is quite justified. The discordant results which 
have been obtained with such methods in man, can be traced to fallacious | 
assumptions which are implicit in the design of many analgesic testing 
methods. Much effort has been expended in quantifying pain stimuli” 
and the measurement of response, but little attention has been given to 
the setting in which such measurements are made. Clinical pain almost — 
invariably occurs in a setting of anxiety, yet in ordinary threshold and _ 
reaction studies on pain, this factor has been ignored. I believe this is) 
what Dr. Beecher has in mind when he stresses the need to take into 


account the “psychic modification of stimuli”. 

Our own work at Lexington indicates that it may be necessary to | 
induce anxiety deliberately in such experiments, in order to reproduce | 
in the laboratory a controllable phenomenon which is analogous to pain | 
as it occurs under clinical conditions. Thus, morphine has little effect | 
on the ability of post-addicts to discriminate intensities of electric shock | 
stimuli when experiments are carried out under “informal’’ conditions | 
which are designed to allay anxiety. In contrast, the drug does have © 
predictable effects under “formal” conditions, conducive to the develop- | 
ment of anxiety. 
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This operational principle can also be applied with advantage to 
research on pain and analgesia in animals. Since the analgesic action 
of morphine in man appears to be related to reduction of “fear of pain’, 
animal experiments should be designed to study the effects of analgesics 
upon anticipatory responses to painful stimuli. Currently, analgesic 
testing procedures in animals are concerned exclusively with studies on 
the effects of drugs on single unconditioned reflexes. I cannot agree 
with Dr. Beecher’s statement that such methods have resulted in the 
discarding of many potentially useful analgesic agents. I am not only 
impressed but puzzled by the fact that the flick of a rat’s tail has proved 
to be such a good dial indicator of analgesia. 

Currently, the selection of a particular analgesic testing method is 
based on the implicit assumption that there is only one ‘‘mechanism”’ 
by which drugs relieve pain—namely, that of morphine action. Actually, 
there is no a priori justification for this assumption. There are many 
varieties of pain and there may be many kinds of mechanisms through 
which pain is relieved by drugs. Though it is known that acetylsalicylic 
acid has analgesic properties, this is not evident when its analgesic 
potency is assayed by methods that have proved useful in comparing 
the efficacies of opiate-like drugs. One may have to devise new methods 
for studying the analgesic effects of agents like acetylsalicylic acid. The 
first step in such a direction would be a renewed study of the pharma- 
cology of acetylsalicylic acid, with a view to delineating specific patterns 
of action which are peculiar to it. 

Finally, one must consider the nature of controls in this sort of 
research. I cannot add anything to Dr. Beecher’s exposition of this 
subject. It should serve as a model for all studies on pain and analgesia 
in man. In animals, it is not so important to keep the observer in 
ignorance of the nature of the compound administered, but if reproducible 
results are to be obtained, all factors that are inherent in the experimental 
procedure must be analyzed and their effects on “stimulus values” 
determined empirically. If these are considered, experimental studies 
on pain and analgesia should yield data of even greater clinical usefulness 
than heretofore. 

N. B. Eddy’ One cannot overemphasize the lack of attention that 
has been given to controls and the need for them in the study of the 
effect of drugs on subjective responses. Obviously attention to controls 
in this field is burdensome; it requires more time and care on the part 
of the observer than he usually gives to the investigation. It also requires 


2From the U. S. Public Health Service, Bethesda, Maryland. 
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more honest and courageous thinking than most clinical investigator 
have brought to their task in the past. 

As Dr. Beecher pointed out, man must be the final, if not the only 
test object for the measurement of subjective reactions. One must of 
course differentiate subjective reactions produced by the drug and those 
relieved by the drug. It is much simpler to deal with the former and if 
one uses normal subjects, administers a placebo as control, maintains 
a uniformity of environmental conditions and gives doses of drugs to 
be compared which have been shown to be equally effective therapeu- 
tically, comparison can be made on the basis of presence or absence of 
the subjective reaction. 

The subjective reactions to be relieved by the drug on the other hand 
must be naturally occurring; that is, pathologically produced, at least 
until such time as it is proved that an artificially produced subjective 
reaction and drug effects upon it are truly comparable to clinical con- 
ditions. This means the testing of a potential clinical agent under the 
conditions which will surround its ultimate clinical use and raises the 
all important question of measurement of intensity of a subjective 
reaction, pain for example. 

A recent study by Papper, Brodie and Rovenstine indicates that 
a third of post-operative patients have no pain, another third have 
mild to moderative pain relieved by one or two doses of an analgesic 
and not recurring and only a third have “real pain” which requires 
more prolonged administration of an analgesic. Dr. Beecher’s studies 
on analgesics have employed post-operative patients with a similar 
observation with respect to the incidence of patients’ complaints. He 
has emphasized the desirability of comparing a substance with a standard 
and a placebo in the same patient, randomizing the administration of 
the three. Such a procedure needs conclusive evidence of absence of 
carry over from one administration of a dose of an analgesic to another 
in the same patient. Dr. Beecher, I believe, has been unable to find in 
his results evidence of such carry over either in the direction of enhanced 
effect or of “acute tolerance”. He has said, however, that “Morphine 
before a placebo gives confidence in the placebo and demonstrably 
increases its effectiveness. A placebo before morphine weakens the 
confidence of the subject in the drug’s powers to help and morphine is 
underrated.” 

In another place Dr. Beecher has distinguished between ‘placebo 
reactors” and “non-reactors”. He has also differentiated between pa- 
tients whose complaint of pain does not recur after the second or third 
dose of an analgesic and those whose recurrences of complaint continues 
through 5 or 6 doses. These may be indications of differences in the 
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intensity of the subjéctive reaction or differences in the mechanism of 
its production. Possible placebo reactors and some at least of those 
whose complaints do not recur after a dose or two of the analgesic are 
patients with psychosomatic pain and the non-reactors and those re- 
quiring many doses of an analgesic are patients with pain of real patho- 
logical origin. In any case the separate consideration of a drug’s effect 
in these several patient categories should have the advantage of matching 
the drug against a more uniform group of subjective reactions and per- 
haps will result in a better appraisal of a drug’s true analgesic effect. 

Separate consideration of the pacient categories may also permit 
us to arrive at the best curve for the dose effect relationship. I find it 
difficult to accept a failure of dose effect relationship beyond 75-80% 
relief of pain. Beyond that point the curve may flatten but it seems 
strange that an 8 mg. dose of morphine, for example, gives about 75% 
relief and twice that dose no greater incidence of relief. However, 
whatever advantages we obtain from this division of results, if we are 
to predict a drug’s probable clinical usefulness, all of the results will 
have to be combined eventually in some manner, because all of the 
patient categories will be encountered in general clinical practice. I 
hope the statistician can help us here to properly weight and combine 
the results. 
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THE CLINICAL ASSAY OF DIURETIC AGENTS* 
I. BIOLOGIC CONSIDERATIONS WHICH DETERMINE THE DESIGN 


THEODORE GREINER AND Harry GOLp 


Department of Pharmacology 
Cornell University Medical College 


In the past few years, diuretic agents have assumed a major role in 
the treatment of patients with heart disease and congestive failure. In 
response to the expanding need, many new agents have been developed, 
with the hope of finding an effective one, low in toxicity and convenient 
to administer. One of the things we need to know about new diuretics is 
the potency of their diuretic action. That knowledge makes it possible to 
administer them in the proper doses, and provides a background against 
which their toxic and other pharmacologic properties can be evaluated 
fairly. In the clinical testing to which they are ordinarily subjected, 
the potency of these agents is often not considered, and at best is only 
approximated. 

Assay of diuretic potency in animals is a preliminary step. Before 
a new drug merits quantitative studies in man, it must have passed a 
battery of animal tests and compare favorably with drugs currently in 
use. Assays in rats and dogs have been described, designed according to 
the requirements of the biometrician, and these provide reliable answers. 
Relative potencies in animals might: be transferable to man when a 
comparison involves different concentrations of the same compound but, 
in the comparison of materials of different constitution, the results of 
animal assay cannot be applied to man with any degree of assurance. 
Animal assay of new diuretic agents is limited in value, and comparison 
of the diuretic potency is secure only if the test is made on the patients 
who are to be treated, those in congestive failure. 


*Presented at symposium held jointly by the American Society for Pharmacology and Experimental 
Therapeutics and The Biometric Society (ENAR), New York, April 14, 1952. 
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The design for bioassay involving patient-participants was developed 
in connection with work on the mercurial diuretics and has been de- 
scribed in a recent publication (2). It should be useful for comparing 
the potency of other types of drugs, particularly those given at repeated 
intervals through a long illness. Fitting the therapeutic need of the 
patient into the experimental design is the key to human bioassay. In 
this particular aspect of our work, we have so far been successful, at 
least as judged by the attitudes expressed by our patients. In the study 
groups, they get the close attention of the doctors at all times, personal 
care for their minor problems, attention to their psychological needs that 
is seldom possible in out-patient departments, and, of course, the best 
cardiac therapy that current knowledge provides. 

To show a comparable diuretic response, patients with congestive 
failure and peripheral edema should be in approximately the same status 
at the time they receive each dose. A patient who is in advanced failure 
with 20 pounds of edema in his interstitial spaces will have a response 
greater than that produced by the same dose at another time when his 
failure is under control with only two pounds of excess fluid. Hospitalized 
patients are thus eliminated, since waiting between doses until the effect 
of the previous dose has been dissipated would tie up hospital beds 
for weeks. On the other hand, the clinic population of cardiac patients 
with congestive failure is an ideal group. These are patients who return 
at regular intervals for the administration of a diuretic to prevent the 
reaccumulation of edema. With proper adjustment, these patients return 
in approximately the same status at each visit, providing an opportunity 
for the same patient to receive all the doses of a projected assay. 

The next question was what to use as the diuretic response. The 
volume of urine flow is the common sign of diuretic action; perhaps of 
greater pharmacologic interest is the excretion of sodium chloride in 
the urine. Both these responses are closely related to the action of 
diuretics and, within certain limits, their magnitude reflects the thera- 
peutic value of the drug. However, the number of measurements that 
must be made for a quantitative biological assay is quite large and either 
response would be difficult to handle. Our experience in the management 
of patients with congestive failure provided a more practical, convenient 
answer in the loss of weight over the 24-hour period following a dose of 
diuretic. The loss of weight in this period represents the major part 
of the diuretic effect, namely, the mobilization and excretion of the 
excess extracellular fluid in patients with congestive failure and thus, 
as a response, it is closely related to the therapeutic action of these 
diuretic agents. Various routes of administration alter the time rela- 
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tionship but even with oral administration, it was established that most, 
if not all, of the diuretic action is present within the first 24 hours. 
The dose-response relationship resembles that for the less convenient 
15-hour period, which has been shown to be a straight line over a 10-fold 
dosage range when the weight loss is plotted against the log-dose (1). 
This relation can be put to use in the comparison of an Unknown and a 
Standard. 

It is, of course, imperative to select appropriate patients. Subjects 
were selected who were sick enough to need diuretic agents and to show 
a response by a loss of edema fluid following the administration of a 
diuretic. Selection of patients naturally raises the question of bias, 
or of the need for restricting results to a special group. Are the com- 
parisons also valid for patients in the hospital with extreme failure? 
Although the response is known to be much greater, we have no reason 
to suspect that the relative responses will be altered. Under a regime 
of systematic treatment, few of the patients who receive diuretics pro- 
gress to a stage requiring hospitalization. The use of a supplementary 
dose, between the weekly assay doses, enabled us to maintain patients 
with moderately advanced failure in comfort, their weight returning 
approximately to the pretreatment level at the time of the next assay 
dose. With these patients included, the assay group comprised about 
one-half of the patients in our clinics with the diagnosis of congestive 
failure. Of the remainder, one-fourth were not suitable for assay, five 
percent because they were too advanced to be safe and 20 percent because 
they were too mild to lose weight when diuretic agents were given at 
weekly intervals. One or two of the latter were not detected until the 
data were analyzed. The remaining fourth of the patients were elimi- 
nated by other factors. These include the uncoeperative and unreliable, 
and patients for whom regular attendance would cause hardship, either 
by reason of work or family responsibility, eliminating a few of the 
younger patients, or for physical disability or senility, eliminating a few 
of the older ones. With the possible exception of hospitalized patients, 
we believe that our sample represents the population receiving diuretics 
in clinical practice. 

Two important sources of error are the variations between patients 
in their response to the same dose and the variation in response within 
patients. The effect of variation between patients is brought under 
control by comparing the Standard and the Unknown at two or three 
dosage levels in each patient. The variation within patients is controlled 
by increasing the number of patients participating in the assay and 
assigning the doses in a random order. The random order also avoids 
possible bias from a fixed sequence of doses. 
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Patients took their diet from a common low-salt list, and a special 
scale was set aside in the clinic and used exclusively for measuring their 
weight. They wore the same clothes on each of the two days on which 
the effect of a particular dose was estimated. Some precision is lost by 
not having patients undress to be weighed. With the space and time 
available in busy out-patient departments, dressing and undressing would 
strain the facilities and reduce the number of patients in the assay. 
Thus, we deliberately accepted this source of error in order to handle the 
necessary number of patients. How important the factor may be, we 
are not certain, for many other variables are involved, such as the full 
or empty bladder, the full or empty bowel, the size of the breakfast, the 
amount of physical exertion, and atmospheric conditions which alter 
the amount of sweat. It is possible to control some of these but as more 
variables are controlled, increasing the precision, the assay becomes 
more unwieldy, until it is too complicated to perform at all. Instead, 
we controlled those factors which could easily be managed and depended 
upon randomization to avoid biasing our comparison between the 
Standard and test preparations. Further, possible help from covariance 
should be kept in mind and, in this particular case, covariance proved 
valuable in reducing the error associated with variations in the control 
weight, the control weight reflecting the amount of edema the patient 
had at the time a dose was given. 

In the absence of a diuretic agent which has been established as a 
Standard for the comparison of other diuretic agents in humans, we 
selected the solution of mercuhydrin sodium of commerce. It has the 
following characteristics: (1) the material is stable; (2) the molecular 
structure is established and various lots are of substantially similar 
compositions; (3) it is well tolerated by intramuscular injection; (4) there 
is extensive clinical experience establishing its efficacy as an organic 
mercurial diuretic agent. The Standard under these circumstances speci- 
fies not only the material but the mode of injection, namely, the intra- 
muscular route. The value established for a new diuretic agent under 
these conditions would have meaning for the practicing physician, since 
it is expressed in terms with which he is familiar. 

The new assay presented in this symposium was designed to select 
the better of two drugs which had passed animal tests and had shown 
diuretic action after oral administration. The two materials were com- 
pared against intra-muscular mercuhydrin. Their potencies were deter- 
mined both by the oral and the intra-muscular route. Since the linear 
relation between log dose and weight loss had previously been established 
for the range we were using, we employed the more efficient two-dose 
design. A low dose and a high dose were selected for each substance to 
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produce comparable responses in the therapeutic range. In the case of 
drug V’ by the oral route (see Table 2 of the following paper), we compro- 
mised necessarily on a lower response so as to reduce the incidence of 
gastrointestinal symptoms. The pharmacological properties of the drugs 
are reported elsewhere (3). 

One of the difficulties encountered in human assay is that patient- 
participants are apt to be prima donnas, not always predictable. They 
cannot always be depended upon to complete the full course planned for 
them. They develop pneumonia and myocardial infarcts. They some- 
times do even worse—they go off on vacations. It is, therefore, wise to 
reduce the number of required doses for each patient to an absolute 
minimum. It is for this reason that we assayed four Unknowns (each of 
two compounds by two routes), using the Standard only once, although 
theoretically in the most efficient design for four Unknowns, the Standard 
should be tested twice. 

Summary. The biologic considerations for assaying diuretic agents 
in man, of which many are equally biometric considerations, may be 
summarized as follows: 

1. A pharmacologic survey in animals is a prerequisite, testing 
diuretic action, toxicity and other pharmacologic properties. 

2. Relative potency in animals is not necessarily transferable to man. 

3. The most favorable patient-participants are those with congestive 
failure in the ambulant state who present the same amount of edema at 
the time of each dose. 

4. Test response is the weight loss 24 hours after dose. 

5. Type of dose-response relationship must be established and slope 
redetermined with at least two dosage levels in each assay. 

6. Variations in the mean response of patients are controlled by 
giving each dose of a Standard and of each Unknown to every patient. 

7. Randomization of order of the doses balances the possible in- 
fluence of one dose on the action of subsequent doses and the effect of 
numerous variations within patients in salt intake, fluid intake, physical 
exertion, weather conditions, etc. 

8. Potency is determined in terms of a Standard, a stable drug in 
widespread therapeutic use. 
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II. ESTIMATION OF THE ERROR IN A CLINICAL ASSAY 


C. I. 


The Connecticut Agricultural Experiment Station and Yale University 


The clinical assays described by Dr. Greiner follow a familiar factorial 
design. In the first assay, three doses of each preparation, a standard 
and two unknowns, were administered once to every patient. The order 
of dosing was random, except that Drug B was added after tests had 
been completed on Drugs A and C. This departure from randomness 
was reflected in the control weights and has been adjusted by covariance. 
Since the analysis of these observations (7) showed no appreciable de- 
parture from linearity, the number of doses was reduced to two in the 
second assay, in which two doses of standard and of each of four un- 
knowns were administered in a random order to each patient. The 
estimation of relative potency from both types of experiments is so well 
known (2, 3) that it need not concern us here. The error of such esti- 
mates, however, is still open to question. 

These assays were arranged in randomized blocks or groups, a group 
consisting of the responses of a single patient to a complete set of 
treatments. The reliability of comparisons between treatments depended 
upon the variability of these differences within patients, independently 
of the average weight lost by each patient. This experimental error is 
usually determined as the mean square in the analysis of variance for 
the interaction of patients by treatments (2, 3), including all treatment 
comparisons. Occasionally, this has underestimated the real error of 
potency. In cylinder plate assays of penicillin, for example, the log 
potency M from repeated assays in different laboratories, and even 
in the same laboratory, has differed more widely than would be expected 
from the standard errors of M, sy (1). These cases were subject to 
important sources of variation other than those between replicated 
measurements within the component assays. More often, repeated assays 
have agreed as well as they were supposed to and we have concluded 
that an assay could be no better than was indicated by its sy . 
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Recently, this conclusion has been challenged. Cornfield has reported 
(4) significantly less variation in repeated estimates of the potency of 
certain carcinogenic agents designed in Latin squares than would be 
predicted from the individual s,’s. A note by Finney and Wood (6), 
developed more fully by Finney (5), reports a similar expectation for 
randomized groups and discusses its meaning in terms of variance com- 
ponents. Finney concludes from his model that the error variance used 
in computing sy should be restricted to a part of the interactions that 
have previously been combined. I wish to consider both our original 
three~lose assay and the new results with two doses in the light of this 
recent hypothesis. 


TABLE 1 


ANALYSIS OF VARIANCE OF THE WEIGHT LOSSES IN A THREE-DOSE ASSAY COM- 

PARING TWO MERCURIAL DIURETICS WITH THE STANDARD, MERCUHYDRIN; 

RECALCULATED FROM THE DATA FOR 16 MEN AND 22 WOMEN PATIENTS (7) AND 
CORRECTED BY COVARIANCE FOR VARIATIONS IN THE CONTROL WEIGHT. 


Men Women 
Line Term 
D.F Mean square | D.F. | Mean square 
1 Patients 15 214.13 21 79.63 
2 Drugs 2 297.40 2 | 226.80 
3 Combined slope 1 552.88 1 | 234.73 
4 Parallelism 2 97 2 15.55 
5 Curvature 3 14.63 3 15.17 
6 Patients X drugs 30 30.89 42 17.87 
7 Patients X slope 15 22.96 21 14.80 
8 Remainder 73* 22.48 104 13.69 
9 “Assay error” 118* 24.665 = s? | 167 15.394 = s2 
10 Trend on control wt. 1 327.78 1 | 563.07 
*Reduced by a missing value. 


Let us examine first the mean squares in the analysis of variance of 
Table 1. In this three-dose assay, the data for 16 men and for 22 women 
have been reanalyzed separately and the results adjusted by covariance 
for differences in the control weights. A female patient omitted in the 
original analysis because of a marked negative slope has been included 
here to avoid any bias in the interaction of patients by slope. The varia- 
tion in the total response of each individual gave the mean squares 
between patients in line 1 of Table 1. In this, as in other clinical assays, 
the segregation of overall differences between patients increased effec- 
tively the precision of the experiment. 
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The treatment effects have been subdivided factorily in lines 2 to 
5. The average differences in the response to the three drugs (line 2) 
gave so significant a mean square as to reject decisively the working 
hypothesis that the drugs were equally potent at the test dosages. The 
assay would have been more precise if the assumed and the assayed 
potencies had not differed but this very uncertainty in our prior knowl- 
edge was the reason for carrying out the assay. Significant differences 
are bound to occur from time to time, even though we would like to 
avoid them. The highly significant slope of the dosage-response curve 
(ne 3) demonstrates conclusively that patient response depended upon 
the dosage level, even though the mean weight loss ranged from only 
one-half to three pounds. The next two mean squares were of the same 
magnitude as the error terms, from which we conclude that the dosage- 
response curves for the three drugs could: be considered as parallel 
(line 4) straight lines free of curvature (line 5) at the dosage levels of 
the present experiment. 

The above conclusions are valid as judged with any of the mean 
squares in lines 6 to 9 of Table 1. These represent the so-called inter- 
action of patients with treatments and ordinarily they are not separated. 
On Finney’s hypothesis, however, the interaction of patients with drugs 
and of patients with slope have different expectations from the remaining 
interactions. They can be computed readily and are given in lines 6 
and 7. The variance for random sampling may be estimated from the 
remaining interaction (pooled in line 8) of patients by parallelism and 
patients by curvature. From Finney’s model, this has an expectation 
of o” without additional variance components. In the present example, 
none of the interactions with patients differed significantly from one 
another, although with both men and women, the interaction of patients 
by drugs (line 6) had a mean square about a third larger than the sup- 
posedly true error (line 8). In both sexes, the interaction of patients 
by slope was nearly equal to the supposedly unbiased estimate of the 
error variance. 

The present assay has been compared with Finney’s model in another 
way. The error variance o in the mean square for the interaction of 
patients by slope may be augmented by a variance component repre- 
senting differences in the expected slope from patient to patient. The 
interaction of patients by drugs contains the same slope component, 
multiplied by the true value of M’, the logarithm of the assayed potency. 
The other coefficients in the model do not concern us here. The more 
nearly the potency assumed in planning the assay approaches a true 
value of one, the more nearly its logarithm approaches 0 and the inter- 
action of drugs by patients estimates only o*. In the present assay, 
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differences between drugs (line 2) were highly significant, so that the 
slope component in line 6 should not vanish. Nevertheless, it does not 
belong in the error of potency because of the presumed correlation 
between the corresponding discrepancies in lines 6 and 7. This part of 
the hypothesis has been tested by computing the correlation in each 


patient between his composite slope and the difference in his response 


to the standard (Drug A) and to the most discrepant unknown (Drug C). 
Differences in the control weight were adjusted by partial correlation 
to obtain the partial correlation coefficients of r = —0.081 for 16 men 
and r = 0.187 for 22 women, neither value approaching significance. 
There was no evidence in this assay of a variance component for slope 
which would discredit the conventional pooled “assay error’’ in line 9. 
The magnitude of the trend on the control weight, as eliminated from 
the assay error by covariance, is shown in the last line of Table 1. 

The second assay provided an additional test of Finney’s hypothesis. 
Four unknowns, two drugs administered both by injection and by mouth, 
were compared with the standard in 13 men and 13 women at two dosage 
levels having the ratio of 1 to 2. The individual responses, y, for 
each patient are given in Table 2, together with the treatment totals 
for y and for the coded initial weights, w. The treatment totals S(y) 
for the first four preparations were of about the same magnitude in 
both men and women and considerably larger than those for preparation 
V’, which had relatively little diuretic effect at the dosage levels of 
the present assay. This apparent difference was examined more closely 
by dividing the variation between drugs into two parts in the analyses 
of variance in Table 3, a comparison of the first four preparations with 
V’ (line 2) and the variability among these other drugs (line 3). A 
difference in method of administration is here equivalent to a separate 
drug. Two other treatment effects were determinable from the treat- 
ment totals, the effect of the combined slope in line 4 and a test for the 
parallelism of the dosage-response curves for all five preparations in line 5. 
The variation in each of these four treatment comparisons among patients 
is given in lines 6 to 9 of Table 3. Inequalities in the control weight, 
shown in Table 2 only as the totals S(w), have been corrected by covari- 
ance as before. 

Two observations on drug U’ were lost from each series in this experi- 
ment, largely due to side effects. The starred values in Table 2 were 
obtained by minimizing the interaction of patients by treatments for use 
in computing the assay error in line 10 of Table 3. The first part of the 
table, however, where the interaction of patients by treatments was 
subdivided, has been computed with replacements which minimized 
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TABLE 2 


| INDIVIDUAL ASSAY RESPONSES (y) IN UNITS OF THE WEIGHT LOST IN QUARTER- 
POUNDS 24 HOURS AFTER TREATMENT WITH A MERCURIAL DIURETIC AND THE 
TOTAL S(y) FOR EACH TREATMENT. THE CORRESPONDING CONTROL WEIGHT 
TOTALS S(w) ARE MEASURED FROM THE MINIMUM CONTROL WEIGHT OF EACH 
PATIENT. THE TWO UNKNOWN OR TEST DRUGS WERE ADMINISTERED BY INJEC- 
TION (U AND V) AND BY MOUTH (U’ AND V’), EACH AT TWO DOSAGE LEVELS ASSUMED 
EQUIVALENT TO ONE AND TWO CC OF THE STANDARD MERCUHYDRIN, WITH THE 
ORDER OF TREATMENTS RANDOMIZED IN EACH PATIENT. 


Weight loss (y) after treatment with 
Sex |Patient' Total 
No. Si So Vi V2 Vi’ V,’ 


Men 1 10 21] 13 22] 11 10 1| 101 
2 7 4 4 10 6 9 8 10 2 10 70 

3 8 10 18; —4 10 | 1 (4 58 

4 9 15] 24 10 2 14] 18 7 1 =-—s 92 

5 6 9] 9 4 =2 6 82 

6 8 18 0 Fist 10 2 55 

7 —1 19 3 610 6 29 8 8 8 —5 85 

8 li 5 22); 16*| 1 36] -—2 4} 109 

9 ll 19 8 7|-5 12% 5& 7|-—4 6 66 
10 1l 28) 15 3] 17 17) 10 7 8 9} 125 
ll 6 Mi nH 3 3] 119 
12 10 19 6 2 3 14 2 iW2i-—6 9 72 
13 10 30 8 14] 16 23 6 14 9 9] 139 
S(y) | 111 237] 118 159] 67 187] 83 142] 18 51 | 1173 
S(w) | 170 193 | 145 148 | 140 122 | 206 130 | 150 177 | 1581 
Women 1 12 1 18 4 16 5 82 
2 12 12 5 10] 12 6; 10 11 5 13 96 

3 4 5 3* 8 8 6 57 

4 12 26] 12 30] 11 24] 13 19] 185 

5 10 17] 138 16] 10° 12 7 8 4 6 | 103 

6 5 ©] W 16 0 10 6 8 0 4 70 

8 4 9 4 bi 2 Ws 8 7 4 6 65 

9 9 8 9 15 3 5 o Tf 6 3 84 
10 16 5! 10 14 2 14 10) 13 97 
ll 13 3 12 3 14 0 10| 2 65 
12 11 5] 14 7|-10 14 0 12 2 -8 47 
13 10 12 6 16 6 2 1 63 


S(y) | 117 145 | 92 167] 62 160| 73 12 1016 
S(w) | 208 124 | 1387 177 | 124 187 | 164 195 | 157 198 | 1671 


*Missing values, discarded because of vomiting or diarrhea; replaced so as to minimize treatment 
by patient interaction or assay error. 
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TABLE 3 


ANALYSIS OF VARIANCE OF THE WEIGHT LOSSES IN THE TWO-DOSE ASSAY OF TABLE 
2, CORRECTED BY COVARIANCE FOR VARIATIONS IN THE CONTROL WEIGHT. 


Men Women 
Line Term 
D.F. | Mean square | D.F. | Mean square 

1 Patients 12 75.00* 12 64.63 * 

2 Drug V’ vs. others 1 1302.73 ** 1 1000.30** 

3 Among other drugs 3 102.33* 3 27.56 

4 Combined slope 1 1069.73 ** 1 664.01 ** 

5 Parallelism 4 67.63 4 23.26 

6 Patients X V’ — others 12 29.94 12 67.81* 

7 Patients X other drugs 36 33.19 36 20.61 

8 Patients X slope 12 53.73 12 17.50 

9 Patients X parallelism 45 35.93 45 27.40 
10 “Assay error” 105 34.266 93 23.303 
1l Trend on control wt. 1 236.70* 1 187.52* 


*Statistically larger than its error at P < .05, **at P < .005. 


the sums of squares in line 9 for patients by parailelism. The single | 


missing value for men in Table 1 was handled similarly. 

As expected, the major variation among drugs was that between 
V’ and the others (line 2). The interaction of this contrast with men 
patients (line 6) was less than either that with the other four drugs 
(line 7) or the random variation (line 9), although the men patients 
varied still-more in slope (line 8). Among the women, however, the 
difference in their response to V’ and to the other drugs varied sig- 
nificantly more than the random error in line 9, although this was 
not paralleled by a larger interaction of patients with slope. Since 
the four interaction mean squares for women were significantly hetero- 
geneous by x’°(P = .03), the assay error for women in line 10 has been 
computed without the sum of squares in line 6. 

The difference in the response on the standard and on Drug V’ has 
been correlated with the slope based upon all drugs for each individual 
patient, leading to the partial coefficients for men of r = .238 and for 
women of r = .608, the latter value being statistically significant at 
P = .05. Despite the small variability in the slope of the dosage- 
response curve among women patients, the larger the slope the greater 
was the difference in the response between V’ and the standard, in 
support of Finney’s hypothesis. 

Since these results support Finney’s model to so limited an extent, 
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it is pertinent to examine its applicability. The present assays do not 
agree with one of his basic assumptions in that the unknowns differed 
chemically from the standard. In consequence, this was a comparative 
rather than an analytical assay. Clinical assays of drugs are so time 
consuming and expensive that we may expect them invariably to be 
comparative rather than analytical. It follows that the interaction of 
patients by drugs might contain a variance component not present in the 
interaction of patients by slope, due to variations in the relative sensi- 
tivity of individual patients to a specific unknown. This is suggested 
by the somewhat larger mean squares in line 6 than in lines 7 or 8 of 
Table 1. 

Finney’s model also assumes a linear dosage-response curve. With 
graded-response assays, we expect the curve at best to be an elongated 
sigmoid with a substantially linear central section. The reactions of 
some patients may fall outside this section in the curved portion ap- 
proaching an upper or lower asymptote. This could introduce an addi- 
tional variance component, both in the terms for curvature and in the 
interaction of patients with parallelism and with curvature. In a two- 
dose assay, it would affect only the interaction of patients by parallelism 
and might explain, for example, the rather larger mean square for women 
in line 9 of Table 3. 

In the present experiment, the simultaneous assay of several un- 
knowns has increased the degrees of freedom for estimating the error 
variance, so that certain components could be omitted without reducing 
seriously its stability. In contrast, an experiment with a single unknown 
and a standard on Finaey’s hypothesis would require within-group 
replication, which is usually impractical, or sacrifice two-thirds of the 
degrees of freedom in estimating the error of a two-dose assay or two- 
fifths in determining the error of a three-dose assay. In a small experi- 
ment, this could more than counter balance a potential gain in precision 
from omitting an extraneous variance component. 

Despite the inadequacy of Finney’s model for comparative assays, 
his proposal focuses needed attention upon the error term. Especially 
if the mean response with an unknown does not differ materially from 
that with the standard, these clinical assays suggest that the variance 
components which Finney would omit are usually negligible. When there 
isa marked difference between the standard and the unknown, however, 
a subdivision of the interaction of patients by treatment should be 
illuminating. If the several mean squares are heterogeneous, two possi- 
bilities may be considered. One is to compute the standard error of M 
with separate error variances for the numerator and for the denominator 
in the equation for M (8). The other is to eliminate the results of 
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patients with aberrant dosage-response curves in which the slope is the 
reverse of that of the majority. This, in fact, led to the omission of one 
of 22 women in the initial analysis of the three-dose assay, although she 
has been restored in calculating Table 1. The clinician necessarily 
selects his patients, so as to avoid complications from other diseases | 
and types of instability. A strongly reversed slope of the dosage- 
response curve might also be considered in patient selection. 

A comparison of the statistics from the three-dose and two-dose 
assays of the present series in Table 4 indicates good agreement in 
their precision. This is measured primarily by \, the observed standard 
deviation about the dosage-response line converted to units of log- 
dose (2). In both assays, men were somewhat more sensitive test subjects 


TABLE 4 


COMPARATIVE PRECISION OF CLINICAL ASSAYS OF MERCURIAL DIURETICS, WHEN 
THE CONTROL WEIGHT IS ADJUSTED BY COVARIANCE. 


Men Women* 

Statistic Three-dose Two-dose Three-dose Two-dose 
Slope, b 15.71 + 3.35 19.70 + 3.55 | 10.388 + 2.29 14.75 + 3.11 
Standard deviation 4.966 5.854 3.889 : 4.827 
Degrees of freedom 118 105 159 93 
= 8/b 316 + .297 + .060 | .875 + .095 .327 + .070 
Ae 304 + .048 .344 + .056 
**No. of responses 

for S.E. = 15% 50.3 64.2 


*Statistics computed with the following omissions: the data from one of 22 individuals in the three- 
dose assay and the interaction in line 6 of Table 3. If all observations are included, A, = .382 + .072, 
requiring an estimated 79.4 responses on the standard and on each unknown. 

**If the assayed potency agrees with its assumed value, the estimated number of patient responses 
on the standard and on each unknown is equal approximately to 20?/84,7, where sy, = log (1.15) for 
a 15 percent standard error (S.E.). 


than the women, requiring fewer observations to obtain the same 
precision. 

Summary. Clinical biological assays may be arranged effectively 
in randomized blocks or groups, each group consisting of the responses 
of a single patient. The validity of basing the standard error of potency 
upon the interaction of groups by treatments has recently been chal- 
lenged, as including an extraneous variance component for variation 
in the slope of the dosage-response curve from group to group. When 
this hypothesis was tested empirically with the data from three-dose 
and two-dose comparative factorial assays, the results were largely 
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negative. Re-examination of the model suggests the possibility of two 
additional components, one for curvature in the dosage-response relation 
of individual groups or patients, and, in comparative assays, another 
for variation in the relative response of individuals to qualitative differ- 
ences between the standard and the unknown. Especially if the assayed 
value of an unknown does not differ markedly from its assumed value, 
the mean square for the interaction of groups by treatments seems likely 
to give a valid standard error of potency in most comparative assays. 
In doubtful cases, the pooled interaction should be subdivided factorially 
and the resulting mean squares examined for homogeneity. 
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DISCUSSION 


Robert A. Lehman* In discussing the paper by Gold, Greiner and 
Bliss, there are two aspects which I believe are worthy of consideration, 
first, a question of sampling and, second, one of experimental design. 

The average weight loss due to the highest dose of the most potent 
drug, A, in the ambulatory patients used in these experiments was less 
than three lbs. and the lowest average weight loss was 1/2 1b. One might 
question whether all of these patients actually needed a diuretic and 
certainly all cardiologists would not prescribe a mercurial as often as 
once a week for patients with so little edema. Other workers (1, 2) 
have shown that about 90 percent of hospitalized cardiac patients lose 


*Campbell Pharmaceutical Co., New York. 
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better than 3 Ibs. in response to a mercurial diuretic and 2/3 lose more 
than 5 lbs. Hence, it appears that the results reported are based upon 
a very restricted sample of the population of cardiac patients. This 
does not affect the validity of the potency ratio but must be kept in 
mind in assessing its significance since there is no evidence. that potency 
ratios for these mercurials will be the same for patients with edema under 
control as for those in severe congestive failure. 

The doses as selected did not give the same average diuretic response 
for each drug. In faci, in the three-dose assay, the highest dose of the 
least potent drug, C, gave a weight loss about equal to the lowest dose 
of the most potent, A. To select dosage levels expected to give a different 
response would be justifiable only if sufficient preliminary data had 
established linearity and parallelism throughout the full response range 
for all three drugs. It is unfortunate here because in experimental 
animals it has been shown (3) that the dose response curve for certain 
mercurial diuretics reaches a peak and falls off at higher doses and there 
is reason to believe that this is a characteristic effect of mercury com- 
pounds on the kidney. Drug C might then be incapable of producing as 
great a diuresis as Drug A at any dose, a serious shortcoming. On the 
other hand, it might give as great a diuresis as A if the dose were 
sufficient, in which case C would be just as useful as A if all other factors 
were equal. 

The authors are to be commended for accomplishing a satisfactory 
bioassay with clinical material. They contend that, with proper design, 
a valid potency ratio with its standard error can be obtained in spite 
of the numerous variables which are inherent in clinical data and I 
believe they have justified their contention. There is no reason why 
future bioassays of mercurial diuretics might not be further improved 
by a better selection of doses on the basis of preliminary tests and 
by using a representative sample of the clinical material, even though 
this may involve still more variables and a larger error. 

There is a definite place for methods of bioassay in man, even, as 
here, where no problem of potency control is in question. However, we 
should not lose sight of the fact that the selection of a diuretic by the 
cardiologist will depend much more on qualitative differences between 
drugs with respect to cardiac toxicity, irritation at the site of injection, 
cost and convenience of administration than upon the precise dosage 
ratio. In other words, comparison of absolute potency becomes para- 
mount in the clinic only for drugs which are in every other respect 
equally acceptable. I believe that the chief contribution of this paper 
lies in the demonstration of one way in which human data may be 
successfully quantified. 
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Donald Mainland* It is very desirable to explore the possibilities 
of the application of such a powerful method as bioassay to therapeutic 
trails; but it might be well to consider the problem from the point of 
view of a statistically-minded clinical investigator, the kind of worker 
whom we wish to encourage. We may ask, for example, whether the 
advantage of such a method over simpler methods is great enough to 
compensate for the greater complexity of design and analysis, a com- 
plexity which would place such investigations outside the scope of many 
clinical workers. 

The problem of the error term in this investigation is an important 
and difficult one, especially in view of Finney’s recent publications. 
Dr. Bliss has explored the application of Finney’s proposals. His treat- 
ment of the problem was commendably undogmatic, in contrast to a 
rather positive attitude that is often adopted when a new method is 
proposed. With the growth of a statistical science, it is inevitable that 
opinions should change and it is admirable that statisticians should 
openly proclaim their changes of opinion but, if opinions were not 
affirmed with quite so much dogmatism as they often are, I think that 
the clinical investigator would form a better conception of the science. 

I think, also, that we might pay some attention to the statistically- 
minded clinician’s attitude to the whole question of error terms. When 
he sees, for example, 118 degrees of freedom for assay error from 16 
patients, or 264 degrees of freedom for error in an experiment on bleeding 
time in six guineapigs, he is apt to feel skeptical, even if he knows that 
tests have shown no significant heterogeneity between patients or be- 
tween animals. 

Finally, he is apt to be rather impatient of mathematical models 
applied without abundant experimental material. To collect and explore 
a large amount of experimental data is, of course, slow and tedious but 
it seems to me to be the only way of settling many disputed questions. 


Harry Gold We may. well observe with intense satisfaction that a 
symposium such as this is now possible. The application of biostatistics 


*New York University College of Medicine. 
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to assays to insure proper design and to avoid error in the interpretation 
of results is now solidly established procedure in assays with lower 
animals. But it is the special strength of this symposium that man has 


displaced the mouse in the center of the stage. It has revealed the 
possibility of applying effectively the powerful tools of biostatistics 
in ferreting out the truth about drug actions, especially their quanti- 
tative relations in the human patient-participant. Thus far, not many 
have risked an attack on the unknown in this manner but there is 
evidence of increasing endeavors in this division of clinical pharmacology. 
One can only hope that such activity will continue to gain momentum. 
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TABLES FOR CONVENIENT CALCULATION OF MEDIAN- 
EFFECTIVE DOSE (LDs OR EDs) AND INSTRUCTIONS IN 
THEIR USE 


S. WEIL 


Mellon Institute, Pittsburgh, Pa. 


O REDUCE THE TIME Of calculation of the median-effective dose and 

its confidence interval to a minimum without the sacrifice of accuracy, 
a group of tables have been calculated according to formulae presented 
in the article by Thompson and Weil (1952). These tables allow the 
use of 2, 3, 4, 5, 6, or 10 animals per dosage level, with 4 or more dosage 
levels being tested per material (using K = 3), provided that the loga- 
rithms of successive dosage levels differ by a constant (d); the goemetric 
factor is denoted by R and d = log R (eg. d = 0.30103 for dosage 
levels of 1.0, 2.0, 4.0, and 8.0 grams of material per kilogram of body 
weight). A short discussion of the moving average and other methods of 
estimation of the median-effective dose (ED;.) follow. LD;. may be 
used instead of ED, if the critical response is death. 

Estimation of the toxicity of a given chemical by single-oral dose to 
rats, rabbits, guinea pigs, or mice or of single skin application to rabbits 
or guinea pigs has been previously expressed in tables and reports issued 
from this laboratory in the form of the LD, calculated by the Bliss 
(1935, 1935, 1938) method of probits. The term LD,) refers to the least 
dosage that should be expected to kill 50% of the animals that received 
it. Bliss defines the probit as five plus the equivalent normal deviation 
(with unit standard deviation). His transformation converts the inte- 
grated normal curve to a straight line that passes through the trans- 
formed point (log LD, , 5). The calculations required in this estimation 
of the LD,» are somewhat difficult and time-consuming, involving suc- 
cessive approximations with tentative regression lines fitted by a method 
of maximum likelihood. Several papers have been published on sim- 
plifications and estimations of the probit LD,) and others on different 
methods of curve-fitting such as the logistic function by Wilson and 
Worcester (1943, 1944) using maximum likelihood, and by Berkson 
(1944, 1946) using a method of weighted least squares. The purpose of 
these transformations is to straighten the fundamental dosage vs. mor- 
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tality curve so that a straight line may be fitted to the transformed 
points. However, there is danger that tendencies toward biased or 
erratic estimates may be induced by mistaken assumptions about the 
form of the fundamental curve or by the technics used for curve fitting. 

Recently Thompson (1947) has published a method that invoives the 
use of moving averages and interpolation to estimate the median- 
effective dose, ED; , identical with our LD, if death is the critical 
response. The use of moving averages is not new in mathematics or 
statistics. It is a well-known graduation long used in time series analyses, 
and has the following advantages: (1) it is free from the assumption as 
to the precise type of fundamental curve involved but it is capable of 
taking into account more of the data than any method that uses only the 
data on both sides of the 50 per cent level of effectiveness; (2) only simple 
computations are involved; and (3) it replaces the fitting of complex 
mathematical curves. The ED, by this method is computed by inter- 
polation involving K + 1 or more dosage levels by the use of the respec- 
tive arithmetic means of log dose and of fraction responding critically for 
K successive points. In most ED,» determinations, too few animals are 
dosed to establish with certainty the exact form of the dosage-mortality 
curve that is involved. 

No detailed comparison will be made here of the accuracy of the 
Thompson method vs. those of Bliss, Berkson, etc. A lengthy discussion 
of this may be found in Thompson’s paper (1947) and in the one by 
Armitage and Allen (1950). However, 30 LD;. results obtained in this 
laboratory were calculated by the Bliss and by the Thompson methods. 
The LDos obtained are practically the same in 29 of the 30. Only in one 
case was the result obtained by the Thompson method outside of the 
fiducial limit range obtained by the Bliss method and even in this case 
the LD,s differed by only 11%. Likewise, the respective standard 
deviations obtained by the two methods are nearly the same. To illus- 
trate, four of these comparisons are listed below. 


(1.96) Standard Devia- 


LD» tion in Logarithmic 
by Method of Units by Method of 

Material Bliss Thompson Bliss Thompson 
Acetic acid, glacial 3.53 3.65 0.04185 0.04520 
Ethyl acrylate 1.95 1.95 0.03582 0.03752 
Isopropanolamine, mono 4.26 4.26 0.03944 0.04132 
Fungicide 341 “A” 1.98 1.94 0.06301 0.06379 
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The general formulae for the estimation of the logarithm of the LD. and 
of the standard deviation of the estimate by the moving average method 
are given in the original article by Thompson (1947). Explanation of 
the use of the tables presented in this paper follow. 

The requirements that must be followed to use these tables are: 

(a) dose a constant number of animals on each dosage level (n = the 
number dosed per level). This restriction is not necessary to use of the 
moving average interpolation method, but tables otherwise would hardly 
be worth making. There is nothing to prevent construction of similar 
tables for different values of K not used in the present article, although 
the usual choice appears to be K = 3. 

(b) space the dosage levels so that they are in a geometric progres- 
sion. For example, if the geometric factor (2) is 2.0, with dosage levels 
of 0.5, 1.0, 2.0, and 4.0 grams per kilogram, then d = 0.30103. 

(c) dose animals on at least K + 1 levels of dosage, 7.e., 4 levels or 
more for K = 3. 

When these requirements are followed we seek to obtain from animals 
dosed at succeeding dosage levels a set of mortality data (r-values in the 
table) that match one of those in the table for the given value of n and K. 
In any of the tables, the two middle numbers in the ‘‘r-value” column 
may be inverted without changing the other values in that row; the 
r-values, 0,0,3,4, are identical with 0,3,0,4. For example, the r-values of 
0,0,3,4 could indicate a mortality of 0 animals of 4 dosed at 2.0 grams per 
kilogram, 0 of 4, 3 of 4, and 4 of 4 dead at 4.0, 8.0, and 16.0 grams per 
kilogram respectively. In this case the table for n = 4 and K = 3 would 
be used. The general formula for the calculation of m, the estimate of 
the ED,) may be reduced to 


(1) log m = log D, + d- (f + 1) for K = 3 


Thus, referring to the 0,0,3,4 r-value row in the n = 4 section of Table I, 
f is seen to pe 0.75. As d (the logarithm of the constant ratio between 
dosage levels) is 0.30103 in this example and as the log D, (the log of the 
lowest of the four dosage levels used) is the logarithm of 2.0, approxi- 
mately 0.30103, thus the log LD, is estimated as 0.30103 + 0.30103 
(1.75) or 0.82783. The LD, is accordingly estimated as 6.73 gm./kg. 

Another example of the use of formula (1) follows. Five animals 
were dosed per level (n = 5) and the following mortalities resulted: 
0 of 5 at 1.26 gm./kg., 1 of 5 at 1.59 gm./kg., 3 of 5 at 2.00 gm./kg., and 
5 of 5 at 2.52 gm./kg. Here, the ratio of successive dosage levels is 
1.26 and d = 0.1. The log LD; from the use of Table I is: 


log LDs. = 0.10037 + 0.1 (0.7 + 1) = 0.27037 
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& 1.86 gm./kg. 


In an estimation of a confidence interval that will encompass the 
LD,» 95 times in 100, we take that bounded by antilog [log m + 2 + oj0¢ wl; 
the following formula is used with the o, value from the tables: 


(2) ay 


Thus, in the first example, the log LD;. was estimated as log m = 
0.82783. The value of 2d - oa, is 2(0.250)(0.30103), equal to 0.150. 
Therefore, the bounds of the confidence interval of log LD,» are 0.828 + 
0.15, z.e., 0.678 to 0.978. The LD, and its 95 of 100 confidence interval 
are estimated as 6.73 (4.76 to 9.51) gm./kg. 

For the second example, with r-values of 0,1,3,5 and d = 0.1 (n = 5, 
K = 8) then the 2 - o,,, , is equal to 2(0.31623)(0.1), about 0.063. 
Therefore, the estimate of log m and its confidence interval limits are 
given by 0.27037 + 0.063. The estimate (m) of the LD, and its range 
are 1.86 (1.61 to 2.16) grams per kilogram. 

The use of these tables plus the use of a table of logarithms (base 10) 
allows the simple and rapid estimation of the LD, and a corresponding 
confidence interval. No curves need be plotted; in fact none are assumed. 
Tables may be calculated, if desired, using other values of n or K. 
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TABLE I. 


TABLE FOR CALCULATION OF MEDIAN-EFFECTIVE DOSE BY MOVING AVERAGE 
INTERPOLATION FOR n= 2,3,4,5O0R6AND K =3 


n=2,K =3 n=4,K =3 n=5,K =3 
r-values f oF r-values f oF r-values f oF 
0,0,1,2 1.00000 | 0.50000 || 2,0,3,4 | 0.50000 | 0.57735 || 0,1,2,5 | 0.90000 | 0.31623 
0,0,2,2 | 0.50000 | 0.00000 || 2,0,4,4 | 0.00000 | 0.57735 || 0,1,3,5 | 0.70000 | 0.31623 
0,1,1,2 | 0.50000 | 0.70711 || 2,1,1,4 | 1.00000 | 0.70711 || 0,1,4,5 | 0.50000 | 0.28284 
0,1,2,2 | 0.00000 | 0.50000 || 2,1,2,4 | 0.50000 | 0.81650 || 0,1,5,5 | 0.30000 | 0.20000 
1,0,1,2 | 1.00000 | 1.00000 || 2,1,3,4 | 0.00000 | 0.91287 || 0,2,2,5 | 0.70000 | 0.34641 
1,0,2,2 | 0.90000 | 1.00000 || 2,2,2,4 | 0.00000 | 1.00000 || 0,2,3,5 | 0.50000 | 0.34641 
1,1,1,2 | 0.00000 | 1.73205 || 3,0,2,4 1.00000 | 1.15470 || 0,2,4,5 | 0.30000 | 0.31623 
0,0,2,1 | 1.00000 | 1.00000 |} 3,0,3,4 | 0.00000 | 1.41421 || 0,2,5,5 | 0.10000 | 0.24495 
0,1,1,1 | 1.00000 | 1.73205 || 3,1,1,4 | 1.00000 | 1.41421 || 0,3,3,5 | 0.30000 | 0.34641 
0,1,2,1 | 0.00000 | 1.00000 || 3,1,2,4 | 0.00000 | 1.82574 || 0,3,4,5 | 0.10000 | 0.31623 
0,0,3,3 | 1.00000 | 0.47140 || 1,0,3,5 | 0.87500 | 0.30778 
n=3,K =3 0,0,4,3 | 0.66667 | 0.22222 || 1,0,4,5 | 0.62500 | 0.26700 
0,1,2,3 | 1.00000 | 0.60858 || 1,0,5,5 | 0.37500 | 0.15625 
0,0,2,3 | 0.83333 | 0.57735 || 0,1,3,3 | 0.66667 | 0.52116 || 1,1,2,5 | 0.87500 | 0.39652 
0,0,3,3 | 0.50000 | 0.00000 || 0,1,4,3 | 0.33333 | 0.35136 || 1,1,3,5 | 0.62500 | 0.40625 
0,1,1,3 | 0.83333 | 0.81650 || 0,2,2,3 | 0.66667 | 0.58794 || 1,1,4,5 | 0.37500 | 0.38654 
0,1,2,3 | 0.50000 | 0.81650 || 0,2,3,3 | 0.33333 | 0.52116 || 1,1,5,5 | 0.12500 | 0.33219 
0,1,3,3 | 0.16667 | 0.57735 || 0,2,4,3 | 0.00000 | 0.38490 || 1,2,2,5 | 0.62500 | 0.44304 
0,2,2,3 | 0.16667 | 0.81650 || 0,3,3,3 | 0.00000 | 0.47140 || 1,2,3,5 | 0.37500 | 0.46034 
1,0,2,3 | 0.75000 | 0.51539 |] 1,0,3,3 | 1.00000 | 0.70711 || 1,2,4,5 | 0.12500 | 0.45178 
1,0,3,3 | 0.25000 | 0.37500 || 1,0,4,3 | 0.50000 | 0.35355 || 1,3,3,5 | 0.12500 | 0.48513 
1,1,1,3 | 0.75000 | 0.71807 || 1,1,2,3 1.00000 | 0.91287 || 2,0,3,5 | 0.83333 | 0.41388 
1,1,2,3 | 0.25000 | 0.80039 || 1,1,3,3 | 0.50000 | 0.79057 || 2,0,4,5 | 0.50000 | 0.39087 
2,0,2,3 | 0.50000 | 1.11803 |/ 1,1,4,3 | 0.00000 | 0.70711 || 2,0,5,5 | 0.16667 | 0.34021 
0,0,3,2 | 0.75000 | 0.37500 || 1,2,2,3 | 0.50000 | 0.88976 || 2,1,2,5 | 0.83333 | 0.53142 
0,1,2,2 | 0.75000 | 0.80039 || 1,2,3,3 | 0.00000 | 0.91287 || 2,1,3,5 | 0.50000 | 0.56519 
0,1,3,2 | 0.25000 | 0.51539 || 2,0,3,3 | 1.00000 | 1.41421 || 2,1,4,5 | 0.16667 | 0.58134 
0,2,2,2 | 0.25000 | 0.71807 || 2,0,4,3 | 0.00000 | 1.15470 || 2,2,2,5 | 0.50000 | 0.61237 
0,1,3,1 | 0.50000 | 1.11803 |} 2,1,2,3 | 1.00000 | 1.82574 || 2,2,3,5 | 0.16667 | 0.67013 
2,1,3,3 | 0.00000 | 1.82574 || 0,0,4,4 | 0.87500 | 0.33219 
n=4,K =3 2,2,2,3 | 0.00000 | 2.00000 || 0,0,5,4 | 0.62500 | 0.15625 
0,0,4,2 | 1.00000 | 0.57735 || 0,1,3,4 | 0.87500 | 0.45178 
0,0,2,4 1.00000 | 0.28868 || 0,1,3,2 1.00000 | 0.91287 || 0,1,4,4 | 0.62500 | 0.38654 
0,0,3,4 | 0.75000 | 0.25000 || 0,1,4,2 | 0.50000 | 0.57735 || 0,1,5,4 | 0.37500 | 0.26700 
0,0,4,4 | 0.50000 | 0.00000 || 0,2,2,2 | 1.00000 | 1.00000 || 0,2,2,4 | 0.87500 | 0.48513 
0,1,1,4 1.00000 | 0.35355 || 0,2,3,2 | 0.50000 | 0.81650 || 0,2,3,4 | 0.62500 | 0.46034 
0,1,2,4 | 0.75000 | 0.38188 || 0,2,4,2 | 0.00000 | 0.57735 || 0,2,4,4 | 0.37500 | 0.40625 
0,1,3,4 | 0.50000 | 0.35355 || 0,3,3,2 | 0.00000 | 0.70711 || 0,2,5,4 | 0.12500 | 0.30778 
0,1,4,4 | 0.25000 | 0.25000 |} 1,0,4,2 1.00000 | 1.15470 || 0,3,3,4 | 0.37500 | 0.44304 
0,2,2,4 | 0.50000 | 0.40825 || 1,1,3,2 | 1.00000 | 1.82574 || 0,3,4,4 | 0.12500 | 0.39652 
0,2,3,4 | 0.25000 | 0.38188 || 1,1,4,2 | 0.00000 | 1.41421 || 1,0,4,4 | 0.83333 | 0.43744 
0,2,4,4 | 0.00000 | 0.28868 || 1,2,2,2 | 1.00000 | 2.00000 || 1,0,5,4 | 0.50000 | 0.23570 
0,3,3,4 | 0.00000 | 0.35355 || 1,2,3,2 | 0.00000 | 1.82574 |} 1,1,3,4 | 0.83333 | 0.59835 
1,0,2,4 | 1.00000 | 0.38490 || 0,2,3,1 | 1.00000 | 1.82574 || 1,1,4,4 | 0.50000 | 0.52705 
1,0,3,4 | 0.66667 | 0.35136 |] 0,2,4,1 | 0.00000 | 1.15470 || 1,1,5,4 | 0.16667 | 0.43744 
1,0,4,4 | 0.33333 | 0.22222 |! 0,3,3,1 | 0.00000 | 1.41421 || 1,2,2,4 | 0.83333 | 0.64310 
1,1,1,4 | 1.00000 | 0.47140 || 0,1,4,1 | 1.00000 | 1.41421 || 1,2,3,4 | 0.50000 | 0.62361 
1,1,2,4 | 0.66667 | 0.52116 1,2,4,4 | 0.16667 | 0.59835 
1,1,3,4 | 0.33333 | 0.52116 n=5,K =3 1,3,3,4 | 0.16667 | 0.64310 
1,1,4,4 | 0.00000 | 0.47140 2,0,4,4 | 0.75000 | 0.64348 
1,2,2,4 | 0.33333 | 0.58794 || 0,0,3,5 | 0.90000 | 0.24495 || 2,0,5,4 | 0.25000 | 0.47598 
1,2,3,4 | 0.00000 | 0.60858 || 0,0,4,5 | 0.70000 | 0.20000 || 2,1,3,4 | 0.75000 | 0.88829 
2,0,2,4 | 1.00000 | 0.57735 || 0,0,5,5 | 0.50000 | 0.00000 |} 2,1,4,4 | 0.25000 | 0.85239 
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TABLE I.—Continued 
n=5,K =3 n=6,K =3 n=6,K =3 
r-values f oF r-values tf of r-values of 
2,2,2,4 | 0.75000 | 0.95607 || 1,4,4,6 | 0.00000 | 0.36878 || 0,3,6,5 | 0.00000 | 0.26833 
2,2,3,4 | 0.25000 | 0.98821 || 2,0,3,6 | 1.00000 | 0.33541 || 0,4,4,5 | 9.20000 | 0.36000 
0,0,5,3 | 0.83333 | 0.34021 || 2,0,4,6 | 0.75000 | 0.32596 || 0,4,5,5 | 0.00000 | 0.32249 
0,1,4,3 | 0.83333 | 0.58134 || 2,0,5,6 | 0.50000 | 0.29580 || 1,0,4,5 | 1.00000 | 0.40311 
0,1,5,3 | 0.50000 | 0.39087 || 2,0,6,6 | 0.25000 | 0.23717 || 1,0,5,5 | 0.75000 | 0.31869 
0,2,3,3 | 0.83333 | 0.67013 |] 2,1,2,6 | 1.00000 | 0.40311 || 1,0,6,5 | 0.50000 | 0.17678 
0,2,4,3 | 0.50000 | 0.56519 || 2,1,3,6 | 0.75000 | 0.42573 || 1,1,3,5 | 1.00000 | 0.48734 
0,2,5,3 0.16667 | 0.41388 || 2,1,4,6 | 0.50000 | 0.43301 || 1,1,4,5 0.75000 | 0.44896 
0,3,3,3 0.50000 | 0.61237 || 2,1,5,6 0.25000 | 0.42573 |} 1,1,5,5 0.50000 | 0.39528 
0,3,4,3 | 0.16667 | 0.53142 |] 2,1,6,6 | 0.00000 | 0.29580 || 1,1,6,5 | 0.25000 | 0.31869 
1,0,5,3 0.75000 | 0.47598 || 2,2,2,6 0.75000 | 0.45415 || 1,2,2,5 1.00000 | 0.51235 
1,1,4,3 | 0.75000 | 0.85239 |] 2,2,3,6 | 0.50000 | 0.48734 || 1,2,3,5 | 0.75000 | 0.50156 
1,1,5,3 | 0.25000 | 0.64348 || 2,2,4,6 | 0.25000 | 0.50621 || 1,2,4,5 | 0.50000 | 0.48088 
1,2,3,3 | 0.75000 | 0.98821 || 2,2,5,6 | 0.00000 | 0.43301 || 1,2,5,5 | 0.25000 | 0.44896 
1,2,4,3 | 0.25000 | 0.88829 || 2,3,3,6 | 0.25000 | 0.53033 |] 1,2,6,5 | 0.00000 | 0.40311 
1,3,3,3 | 0.25000 | 0.95607 || 2,3,4,6 | 0.00000 | 0.48734 |/ 1,3,3,5 | 0.50000 | 0.50621 
3,0,3,6 || 1.00000 | 0.44721 || 1,3,4,5 | 0.25000 | 0.50156 
n=6,K =3 3,0,4,6 0.66667 | 0.43885 |} 1,3,5,5 0.00000 | 0.48734 
3,0,5,6 0.33333 | 0.44721 || 1,4,4,5 0.00000 | 0.51235 
0,0,3,6 1.00000 | 0.22361 || 3,0,6,6 0.00000 | 0.44721 || 2,0,4,5 1.00000 | 0.53748 
0,0,4,6 | 0.83333 | 0.21082 || 3,1,2,6 1.00000 | 0.53748 || 2,0,5,5 0.66667 | 0.42455 
0,0,5,6 0.66667 | 0.16667 || 3,1,3,6 0.66667 | 0.57090 || 2,0,6,5 0.33333 | 0.30225 
0,0,6,6 | 0.50000 | 0.00000 || 3,1,4,6 0.33333 | 0.61464 || 2,1,3,5 1.00000 | 0.64979 
0,1,2,6 1.00000 | 0.26874 || 3,1,5,6 0.00000 | 0.64979 || 2,1,4,5 0.66667 | 0.59835 
0,1,3,6 | 0.83333 | 0.27889 || 3,2,2,6 | 0.66667 | 0.60858 |] 2,1,5,5 | 0.33333 | 0.55998 
0,1,4,6 | 0.66667 | 0.26874 || 3,2,3,6 0.33333 | 0.68313 || 2,1,6,5 0.00000 | 0.53748 
0,1,5,6 | 0.50000 | 0.23570 |} 3,2,4,6 | 0.00000 | 0.74536 || 2,2,2,5 | 1.00000 | 0.68313 
0,1,6,6 | 0.33333 | 0.16667 || 3,3,3,6 0.00000 | 0.77460 || 2,2,3,5 0.66667 | 0.66852 
0,2,2,6 | 0.83333 | 0.29814 || 4,0,3,6 1.00000 | 0.67082 || 2,2,4,5 0.33333 | 0.66852 
0,2,3,6 | 0.66667 | 0.30732 || 4,0,4,6 | 0.50000 | 0.70711 || 2,2,5,5 | 0.00000 | 0.68313 
0,2,4,6 | 0.50000 | 0.29814 || 4,0,5,6 | 0.00000 | 0.80622 || 2,3,3,5 | 0.33333 | 0.70097 
0,2,5,6 | 0.33333 | 0.26874 || 4,1,2,6 | 1.00000 | 0.80622 || 2,3,4,5 | 0.00000 | 0.74536 
0,2,6,6 | 0.16667 | 0.21082 || 4,1,3,6 0.50000 | 0.89443 || 3,0,4,5 1.00000 | 0.80622 
0,3,3,6 | 0.50000 | 0.31623 || 4,1,4,6 | 0.00000 | 1.02470 || 3,0,5,5 | 0.50000 | 0.65192 
0,3.4,6 | 0.33333 | 0.30732 || 4,2,2,6 | 0.50000 | 0.94868 || 3,0,6,5 | 0.00000 | 0.67082 
0,3,5,6 | 0.16667 | 0.27889 || 4,2,3,6 | 0.00000 | 1.11803 || 3,1,3,5 | 1.00000 | 0.92195 
0,3,6,6 | 0.00000 | 0.22361 || 5,0,3,6 | 1.00000 | 1.34164 |] 3,1,4,5 | 0.50000 | 0.90830 
0,4,4,6 | 0.16667 | 0.29814 || 5,0,4,6 | 0.00000 | 1.61245 || 3,1,5,5 | 0.00000 | 0.92195 
0,4,5,6 | 0.00000 | 0.26874 || 5,1,2,6 | 1.00000 | 1.61245 || 3,2,2,5 | 1.00000 | 1.02470 
1,0,3,6 1.00000 | 0.26833 |} 5,1,3,6 0.00000 | 1.94936 || 3,2,3,5 0.50000 | 1.01242 
1,0,4,6 | 0.80000 | 0.25612 || 5,2,2,6 | 0.00000 | 2.04939 || 3,2,4,5 | 0.00000 | 1.11803 
1,0,5,6 | 0.60000 | 0.21541 || 0,0,4,5 | 1.00000 | 0.26833 || 3,3,3,5 | 0.00000 | 1.16190 
1,0,6,6 | 0.40000 | 0.12000 || 0,0,5,5 | 0.80000 | 0.25612 || 4,0,4,5 | 1.00000 | 1.61245 
1,1,2,6 1.00000 | 0.32249 || 0,0,6,5 0.60000 | 0.12000 || 4,0,5,5 0.00000 | 1.61245 
1,1,3,6 | 0.80000 | 0.33704 || 0,1,3,5 1.00000 | 0.34641 || 4,1,3,5 1.00000 | 1.94936 
1,1,4,6 | 0.60000 | 0.33226 || 0,1,4,5 | 0.80000 | 0.36000 || 4,1,4,5 | 0.00000 | 2.04939 
1,1,5,6 | 0.40000 | 0.30724 || 0,1,5,5 0.60000 | 0.30724 || 4,2,2,5 1.00000 | 2.04939 
1,1,6,6 | 0.20000 | 0.25612 || 0,1,6,5 0.40000 | 0.21541 || 4,2,3,5 0.00000 | 2.23607 
1,2.2,6 0.80000 | 0.36000 || 0,2,2,5 1.00000 | 0.36878 || 0,0,5,4 1.00000 | 0.29580 
1,2,3,6 | 0.60000 | 0.37736 || 0,2,3,5 | 0.80000 | 0.40200 || 0,0,6,4 | 0.75000 | 0.23717 
1,2,4,6 | 0.40000 | 0.37736 || 0,2,4,5 0.60000 | 0.37736 || 0,1,4,4 1.00000 | 0.43301 
: 1,2,5,6 | 0.20000 | 0.36000 || 0,2,5,5 | 0.40000 | 0.33226 |] 0,1,5,4 | 0.75000 | 0.42573 
1,2,6,6 | 0.00000 | 0.26833 |} 0,2,6,5 | 0.20000 | 0.25612 |] 0,1,6,4 | 0.50000 | 0.29580 
1,3,3,6 | 0.40000 | 0.39799 || 0,3,3,5 0.60000 | 0.39799 || 0,2,3,4 1.00000 | 0.48734 
1,3,4,6 0.20000 | 0.40200 || 0,3,4,5 0.40000 | 0.37736 || 0,2,4,4 0.75000 | 0.50621 
1,3,5,6 | 0.00000 | 0.34641 || 0,3,5,5 | 0.20000 | 0.33704 || 0,2,5,4 | 0.50000 | 0.43301 
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BIOMETRICS, SEPTEMBER 1952 


CALCULATION OF MEDIAN-EFFECTIVE DOSE BY MOVING AVERAGE 
INTERPOLATION FOR n = 10 AND K =3 


f oF of f oF 
0 10 | 1.0 0.16667 || 1,1,5,10 0.21631 || 2,2,7,10 | 0.50000 | 0.26021 
0 10 | 0.9 0.16330 |} 1,1,6,10 0.21419 |} 2,2,8,10 | 0.37500 | 0.24694 
0 10 | 0.8 0.15275 || 1,1,7,10 0.20621 || 2,2,9,10 | 0.25000 | 0.24296 
0 10 | 0.7 0.13333 |} 1,1,8,10 0.19166 || 2,2,10,10) 0.12500 | 0.22140 
0, 10 | 0.6 0.10000 || 1,1,9,10 0.16882 |} 2,3,3,10 | 0.87500 | 0.27043 
0,0,10,10] 0.5 0.00000 |} 1,1,10,10 0.13354 || 2,3,4,10 | 0.75000 | 0.28106 
0, 10 | 1.0 0.19149 || 1,2,3,10 0.22529 || 2,3,5,10 | 0.62500 | 0.28603 
0,1,5,10 | 0.9 0.19436 |] 1,2,4,10 0.23457 || 2,3,6,10 | 0.50000 | 0.28565 
0, 10 | 0.8 0.19149 || 1,2,5,10 0.23843 || 2,3,7,10 | 0.37500 | 0.27990 
0,1,7,10 | 0.7 0.18257 || 1,2,6,10 0.23715 || 2,3,8,10 | 0.25000 | 0.28260 
0,1,8,10 | 0.6 0.16667 || 1,2,7,10 0.23064 || 2,3,9,10 | 0.12500 | 0.27081 
0, 10 | 0.5 0.14142 |} 1,2,8,10 0.21842 || 2,3,10,10) 0.00000 | 0.25345 
0, 10] 0.4 0.10000 |] 1,2,9,10 0.19945 || 2,4,4,10 | 0.62500 | 0.29204 
0, 0; 1.0 0.20276 || 1,2,10,10 0.17151 || 2,4,5,10 | 0.50000 | 0.29756 
0, 0} 0.9 0.21082 || 1,3,3,10 0.24034 || 2,4,6,10 | 0.37500 | 0.29789 
0, 0| 0.8 0.21344 || 1,3,4,10 0.24968 || 2,4,7,10 | 0.25000 | 0.30619 
0, 0 | 0.7 0.21082 || 1,3,5,10 0.25391 ||} 2,4,8,10 | 0.12500 | 0.30117 
0, 0 | 0.6 0.20276 || 1,3,6,10 0.25331 || 2,4,9,10 | 0.00000 | 0.29166 
0, 0| 0.5 0.18856 || 1,3,7,10 0.24784 || 2,5,5,10 | 0.37500 | 0.30369 
0 0| 0.4 0.16667 || 1,3,8,10 0.23715 || 2,5,6,10 | 0.25000 | 0.31732 
0, ,10| 0.3 0.13333 || 1,3,9,10 0.22050 |} 2,5,7,10 | 0.12500 | 0.31799 
0 10 | 0.9 0.21602 || 1,3,10,10 0.19637 || 2,5,8,10 | 0.00000 | 0.31458 
0 10 | 0.8 0.22361 || 1,4,4,10 0.25926 |} 2,6,6,10 | 0.12500 | 0.32340 
0 10 | 0.7 0.22608 || 1,4,5,10 0.26392 || 2,6,7,10 | 0.00000 | 0.32543 
0, 10 | 0.6 0.22361 || 1,4,6,10 0.26392 || 3,0,5,10 | 1.00000 | 0.23809 
0, 10 | 0.5 0.21602 || 1,4,7,10 0.25926 || 3,9,6,10 | 0.85714 | 0.23536 
0 10 | 0.4 0.20276 || 1,4,8,10 0.24968 || 3,0,7,10 | 0.71429 | 0.22695 
0 10 | 0.3 0.18257 || 1,4,9,10 0.23457 || 3,0,8,10 | 0.57143 | 0.21695 
0 ,10) 0.2 0.15275 || 1,4,10,10 0.21276 || 3,0,9,10 | 0.42857 | 0.18962 
0 ,10 | 0.7 0.23094 || 1,5,5,10 0.26907 || 3,0,10,10) 0.28571 | 0.15587 
0,4,5,10 | 0.6 0.23336 || 1;5,6,10 0.26963 || 3,1,4,10 | 1.00000 | 0.27355 
0 10 | 0.5 0.23094 || 1,5,7,10 0.26565 || 3,1,5,10 | 0.85714 | 0.27941 
0 10 | 0.4 0.22361 || 1,5,8,10 0.25690 || 3,1,6,10 | 0.71429 | 0.28057 
0 10 | 0.3 0.21082 |} 1,5,9,10 0.24287 || 3,1,7,10 | 0.57143 | 0.28074 
0 ,10 | 0.2 0.19149 || 1,6,6,10 0.27076 || 3,1,8,10 | 0.42857 | 0.26877 
0, ,10) 0.1 0.16330 || 1,6,7,10 0.26736 || 3,1,9,10 | 0.28571 | 0.25517 
0 19 | 0.5 0.23570 || 1,6,8,10 0.25926 || 3,1,10,10) 0.14286 | 0.23536 
0 10 | 0.4 0.23336 || 1,7,7,10 0.26450 || 3,2,3,10 | 1.00000 | 0.28965 
0 10 | 0.3 0.22608 || 2,0,5,10 0.20833 || 3,2,4,10 | 0.85714 | 0.30278 
0 10 | 0.2 0.21344 || 2,0,6,10 0.20465 || 3,2,5,10 | 0.71429 | 0.31122 
0 10 | 0.1 0.19436 || 2,0,7,10 0.19320 || 3,2,6,10 | 0.57143 | 0.31857 
0,5,10,10) 0.0 0.16667 || 2,0,8,10 0.17237 ||} 3,2,7,10 | 0.42857 | 0.31536 
0,6,6,10 | 0.3 0.23094 || 2,0,9,10 0.10534 || 3,2,8,10 | 0.28571 | 0.31122 
0,6,7,10 | 0.2 0.22361 || 2,0,10,10 0.07365 || 3,2,9,10 | 0.14286 | 0.30278 
0,6,8,10 | 0.1 0.21082 || 2,1,4,10 0.23936 || 3,2,10,10) 0.00000 | 0.28965 
0,6,9,10 | 0.0 0.19149 || 2,1,5,10 0.22902 || 3,3,3,10 | 0.85714 | 0.31018 
0,7,7,10 | 0.1 0.21602 || 2,1,6,10 0.24116 || 3,3,4,10 | 0.71429 | 0.32546 
0,7,8,10 | 0.0 0.20276 || 2,1,7,10 0.23246 || 3,3,5,10 | 0.57143 | 0.33926 
1,0,5,10 | 1.0 0.18518 || 2,1,8,10 0.21651 || 3,3,6,10 | 0.42857 | 0.34291 
1,0,6,10 | 0.88889 | 0.18186 || 2,1,9,10 0.19151 || 3,3,7,10 | 0.28571 | 0.34574 
1,0,7,10 0.17151 || 2,1,10,10 0.17678 || 3,3,8,10 | 0.14286 | 0.34480 
1,0,8,10 0.15270 || 2,2,3,10 0.25345 || 3,3,9,10 | 0.00000 | 0.34007 
1,0,9,10 0.12159 || 2,2,4,10 0.26393 || 3,4,4,10 | 0.57143 | 0.34588 
1,0,10,10 0.06172 || 2,2,5,10 0.26842 || 3,4,5,10 | 0.42857 | 0.35589 
1,1,4,10 0.21276 || 2,2,6,10 0.26717 || 3,4,6,10 | 0.28571 | 0.36488 
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TABLE II.—Continued 


r-values oF r-values tf oF r-values oF 

3,4,7,10 | 0.14286 | 0.37017 || 5,3,4,10 | 0.60000 | 0.46667 || 9,0,6,10 | 0.00000 | 1.77951 
3,4,8,10 | 0.00000 | 0.37192 |] 5,3,5,10 | 0.40000 | 0.48990 || 9,1,4,10 | 1.00000 | 1.77951 
3,5,5,10 | 0.28571 | 0.37104 || 5,3,6,10 | 0.20000 | 0.52068 || 9,1,5,10 | 0.00000 | 2.18581 
3,5,6,10 | 0.14286 | 0.38223 || 5,3,7,10 | 0.00000 | 0.54569 || 9,2,3,10 | 1.00000 | 2.02759 
3,5,7,10 | 0.00000 | 0.38978 || 5,4,4,10 | 0.40000 | 0.49889 || 9,2,4,10 | 0.00000 | 2.33333 
3,6,6,10 | 0.00000 | 0.39555 || 5,4,5,10 | 0.20000 | 0.53748 || 9,3,3,10 | 0.00000 | 2.38048 
4,0,5,10 | 1.00000 | 0.27778 || 5,4,6,10 | 0.00000 | 0.56960 || 0,0,6,9 | 1.00000 | 0.21276 
4,0,6,10 | 0.83333 | 0.27592 || 5,5,5,10 | 0.00000 | 0.57735 || 0,0,7,9 | 0.88889 | 0.19637 
4,0,7,10 | 0.66667 | 0.27027 || 6,0,5,10 | 1.00000 | 0.41667 || 0,0,8,9 0.77778 | 0.17151 
4,0,8,10 | 0.50000 | 0.26058 || 6,0,6,10 | 0.75000 | 0.45644 || 0,0,9,9 | 0.66667 | 0.13354 
4,0,9,10 | 0.33333 | 0.24637 || 6,0,7,10 | 0.50000 | 0.43301 0,0,10,9 | 0.55556 | 0.06172 
4,0,10,10| 0.16667 | 0.22680 || 6,0,8,10 | 0.25000 | 0.45262 || 0,1,5,9 | 1.00000 | 0.24287 
4,1,4,10 | 1.00000 | 0.31914 |] 6,0,9,10 | 0.00000 | 0.47871 || 0,1,6,9 | 0.88889 | 0.23457 
4,1,5,10 | 0.83333 | 0.32710 || 6,1,4,10 | 1.00000 | 0.47871 || 0,1,7,9 | 0.77778 | 0.22050 
4,1,6,10 | 0.66667 | 0.33178 || 6,1,5,10 | 0.75000 | 0.52705 || 0,1,8,9 | 0.66667 | 0.19945 
4,1,7,10 | 0.50000 | 0.33333 || 6,1,6,10 | 0.50000 | 0.52042 || 0,1,9,9 | 0.55555 | 0.16882 
4,1,8,10 | 0.33333 | 0.33178 || 6,1,7,10 | 0.25000 | 0.54962 || 0,1,10,9 | 0.44444 | 0.12159 
4,1,9,10 | 0.16667 | 0.32710 || 6,1,8,10 | 0.00000 | 0.58333 || 0,2,4,9 1.00000 | 0.25926 
4,1,10,10] 0.00000 | 0.31914 || 6,2,3,10 | 1.00000 | 0.50690 || 0,2,5,9 | 0.88889 | 0.25690 
4,2,3,10 | 1.00000 | 0.33793 || 6,2,4,10 | 0.75000 | 0.56519 || 0,2,6,9 | 0.77778 | 0.24968 
4,2,4,10 | 0.83333 | 0.35428 || 6,2,5,10 | 0.50000 | 0.57130 || 0,2,7,9 | 0.66667 | 0.23715 
4,2,5,10 | 0.66667 | 0.36711 || 6,2,6,10 | 0.25000 | 0.60953 || 0,2,8,9 | 0.55556 | 0.21842 
4,2,6,10 | 0.50000 | 0.37679 || 6,2,7,10 | 0.00000 | 0.65085 || 0,2,9,9 | 0.44444 | 0.19166 
4,2,7,10 | 0.33333 | 0.38356 || 6,3,3,10 | 0.75000 | 0.57735 || 0,2,10,9 | 0.33333 | 0.15270 
4,2,8,10 | 0.16667 | 0.38756 || 6,3,4,10 | 0.50000 | 0.59512 || 0,3,3,9 1.00000 | 0.26450 
4,2,9,10 | 0.00000 | 0.38889 || 6,3,5,10 | 0.25000 | 0.64280 || 0,3,4,9 | 0.88889 | 0.26736 
4,3,3,10 | 0.83333 | 0.36289 || 6,3,6,10 | 0.00000 | 0.69222 || 0,3,5,9 0.77778 | 0.26565 
4,3,4,10 | 0.66667 | 0.38356 || 6,4,4,10 | 0.25000 | 0.65352 || 0,3,6,9 0.66667 | 0.25926 
4,3,5,10 | 0.50000 | 0.40062 || 6,4,5,10 | 0.00000 | 0.71200 || 0,3,7,9 | 0.55556 | 0.24784 
4,3,6,10 | 0.33333 | 0.41450 || 7,0,5,10 | 1.00000 | 0.55556 || 0,3,8,9 | 0.44444 | 0.23064 
4,3,7,10 | 0.16667 | 0.42552 || 7,0,6,10 | 0.66667 | 0.57013 || 0,3,9,9 | 0.33333 | 0.20621 
4,3,8,10 | 0.00000 | 0.43390 || 7,0,7,10 | 0.33333 | 0.61195 || 0,3,10,9 | 0.22222 | 0.17151 
4,4,4,10 | 0.50000 | 0.48025 || 7,0,8,10 | 0.00000 | 0.67586 || 0,4,4,9 | 0.77778 | 0.27076 
4,4,5,10 | 0.33333 | 0.42913 || 7,1,4,10 | 1.00000 | 0.63828 || 0,4,5,9 | 0.66667 | 0.26963 
4,4,6,10 | 0.16667 | 0.44675 || 7,1,5,10 | 0.66667 | 0.66975 || 0,4,6,9 | 0.55556 | 0.26392 
4,4,7,10 | 0.00000 | 0.46148 || 7,1,6,10 | 0.33333 | 0.72293 || 0,4,7,9 | 0.44444 | 0.25331 
4,5,5,10 | 0.16667 | 0.45361 7,1,7,10 | 0.00000 | 0.79349 || 0,4,8,9 0.33333 | 0.23715 
4,5,6,10 | 0.00000 | 0.47466 || 7,2,3,10 | 1.00000 | 0.67586 || 0,4,9,9 0.22222 | 0.21419 
5,0,5,10 | 1.00000 | 0.33333 || 7,2,4,10 | 0.66667 | 0.72293 || 0,4,10,9 | 0.11111 | 0.18186 
5,0,6,10 | 0.80000 | 0.33333 || 7,2,5,10 | 0.33333 | 0.78829 || 0,5,5,9 | 0.55556 | 0.26907 
5,0,7,10 | 0.60000 | 0.33333 || 7,2,6,10 | 0.00000 | 0.86780 || 0,5,6,9 0.44444 | 0.26392 
5,0,8,10 | 0.40000 | 0.33333 || 7,3,3,10 | 0.66667 | 0.73981 || 0,5,7,9 | 0.33333 | 0.25391 
5,0,9,10 | 0.20000 | 0.33333 || 7,3,4,10 | 0.33333 | 0.81901 || 0,5,8,9 | 0.22222 | 0.23843 
5,0,10,10) 0.00000 | 0.33333 || 7,3,5,10 | 0.00000 | 0.90948 || 0,5,9,9 0.11111 | 0.21631 
5,1,4,10 | 1.00000 | 0.38297 || 7,4,4,10 | 0.00000 | 0.92296 || 0,5,10,9 | 0.00000 | 0.18518 
5,1,5,10 | 0.80000 | 0.39440 || 8,0,5,10 | 1.00000 | 0.83333 || 0,6,6,9 | 0.33333 | 0.25926 
5,1,6,10 | 0.60000 | 0.40552 || 8,0,6,10 | 0.50000 | 0.88192 || 0,6,7,9 | 0.22222 | 0.24968 
5,1,7,10 | 0.40000 | 0.41633 || 8,0,7,10 | 0.00000 | 1.01379 || 0,6,8,9 0.11111 | 0.23457 
5,1,8,10 | 0.20000 | 0.42687 || 8,1,4,10 | 1.00000 | 0.95743 || 0,6,9,9 | 0.00000 | 0.21276 
5,1,9,10 | 0.00000 | 0.43716 || 8,1,5,10 | 0.50000 | 1.02740 || 0,7,7,9 | 0.11111 | 0.24034 
5,2,3,10 | 1.00000 | 0.40552 || 8,1,6,10 | 0.00000 | 1.16667 || 0,7,8,9 | 0.00000 | 0.22529 
5,2,4,10 | 0.80000 | 0.42688 || 8,2,3,10 | 1.00000 | 1.01379 || 1,0,6,9 | 1.00000 | 0.23936 
5,2,5,10 | 0.60060 0.44721 || 8,2,4,10 | 0.50000 | 1.10554 |] 1,0,7,9 | 0.87500 | 0.22060 
5,2,6,10 | © 40000 | 0.46667 || 8,2,5,10 | 0.00000 | 1.25830 1,0,8,9 0.75000 | 0.19376 
5,2,7,10 | 0.29000 | 0.48534 || 8,3,3,10 | 0.50000 | 1.13039 || 1,0,9,9 | 0.62500 | 0.15468 
5,2,8,10 | & 00900 | 0.50332 |} 8,3,4,10 | 0.00000 | 1.30171. || 1,0,10,9 | 0.50000 | 0.08838 
5,3,3,10 | 0.36000 | 0.43716 || 9,0,5,10 | 1.00000 | 1.66667 || 1,1,5, 1.00000 | 0.27323 
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.66667 
33333 
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66667 
33333 
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0.33333 


|| 
f os | | | r-values f oF 
4,5,6 | 1.00000 | g.94933 || 0,6,7,5 | 0 | 0.00000 
4,6,6 | 0.66667 | [§.86780 || 0,6,8,5 10 1.00000 | 
4,7,6 | 0.33333 | 9.83887 || 0,6,9,5 0 0.50000 | #25277 
4,8,6 | 0.00000 | §.86780 || 0,7,7,5 | 0 0.00000 
|) ee 5,5,6 | 0.66667 | H.88192 || 0,7,8,5 0 1.00000 | 1.44388 
5,6,6 || 1,0,10,5 | 0 0.50000 
,5,7,6 0948 || 1,1,9,5 | 0 0.00000 | 1.36422 
,6,6,6 b2296 || 1,1,10,: 1000 | 0 0.00000 | 1.38444 
i 0,9,6 95743 || 1,2,8,5 000 | 0 1.00000 | 1.66667 
0,10,6 57735 || 1,2,9,5 }000 | 0 1.00000 | 2.18581 
0.20, 16667 || 1,2, 10,: 000 | 0 0.00000 | 2.23607 
,1,9,6 91287 || 1,3,7,5 | 0 1.00000 | 2.51661 
,1,10,m 95743 || 1,3,8,5 | 0.00000 | 2.60342 
19024 || 1,3,9,5 )000 | 1.00000 | 2.72845 
10554 || 1,3, 10, 000 | 0 0.00000 | 2.84800 
16667 || 1,4,6.5 1000 | 1.00000 | 2.84800 
28019 || 1,4,7 000 | 0 0.00000 | 3.00000 
; 22474 || 1,4,8 000 | 1.00000 | 2.88675 
19024 |} 1,4,9 000 0.00000 | 3.07318 
‘ 32288 || 1,4,1 1000 | ¢ 1.00000 | 0.47871 
7 29099 || 1, 1000 | ¢ 1.00000 | 0.58333 
; 28019 || 1 3000 | ¢ 0.75000 | 0.45262 
eee 31233 || 1 000 | « 1.00000 | 0.65085 
goes: 32288 || 1 5000 | ( 0.75000 | 0.54962 
23607 || 1 1000 | 0.50000 | 0.53301 
66667 || 1 000 | ¢ | §,4,7,4 | 1.00000 | 0.69222 
aL i 60342 || 1 5000 | ¢ || 9,4,8,4 | 0.75000 | 0.60953 
18581 |} 1 000 | ¢ 9,4,9,4 | 0.50000 | 0.52042 
28 84800 || 1 1000 | ¢ || §,4,10,4 | 0.25000 | 0.45644 
51661 ,10, 1000 | ¢ || #,5,6,4 | 1.00000 | 0.71200 
00000 59,5 000 | || | 0.75000 | 0.64280 
72845 ,10, 6667 | ,5,8,4 | 0.50000 | 0.57130 
07318 || 1000 ,5,9,4 | 0.25000 | 0.52705 
yes 84800 19,5 6667 | 0 0,5,10,m | 0.00000 | 0.41667 
88675 3333 0,6,6, 0.75000 | 0.65352 
{op 33333 0000 | 0 0,6,7, 0.50000 | 0.59512 
Wt 43716 ] 6667 | 0 0,6,8, 0.25000 | 0.56519 
iP ,10,5 33333 | 3333 | 0 0,6,9, | 0.00000 | 0.47871 
| 50332 000 | 0 0,7,7; 0.25000 | 0.57735 
42687 0000 | 0 0,7,8, 0.00000 | 0.50690 
, ,10,5 | 33333 || 6667 | 0 || 1,1,10,4 | 1.00000 | 0.63828 
;, Fa: 54569 | 3333 | 0 || 1,2,9,4 | 1.00000 | 0.77778 
‘ ,8,5 48534 || 0000 | 0 1,2,10,4 | 0 0.58443 
9,5 | 41633 0000 | 0 | 1,3,8,4 | 1 0.79349 
- ,10,5 33333 | 6667 | 0 1,3,9,4 | 0 0.71722 
inf Cus 56960 3333 | 0 1,3,10,4 | 0 0.58443 
7 52068 | 0000 | 1,4,7,4 | 1 0.85346 
ra 8 46667 | 3333 | 0 1,4,8,4 | 0 | 0.79866 
40552 || 0000 | 0. || 1,4,9,4 | 0 | 0.69979 
33333 || 0000 | || 1,4,10g4 | 0 0.63828 
57735 || 0000 | || 1,5,6,8 1 | 0.88192 
ms 53748 || 0000 | 0 1,5,7,8 | 0 | 0.84376 
5 48990 || b0000 1 | 1,5,8, 0 | 0.76712 
i of 5 44721 || 79,5 0000 | 1 1,5,9, 0 0.72860 
5 39440 ,10, 10000 | 1 1,6,6, 0 0.85827 
5 33333 0000 | 1 1.6.7, | (0.79866 
.49889 0000 | 1. 16,8; 0.00000 | 0.77778 
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QUERIES 


GrorcE W. SnepDEcor, Editor 


QUERY: Soluble matter in 4 extract solutions was determined 

95 by pipetting, in duplicate, 25, 50 and 100 ml. volumes of solution 

into dishes, then evaporating and weighing the residues. The 

work was replicated by repeating it on 3 days. The problem is to learn 

whether the volume of solution pipetted was significant and also whether 

the method in general was reliable and could be checked from day to day. 

Differences between extracts were not of interest. The following analysis 
of variance resulted from the data. How is significance tested? 


Before attempting to answer the questions posed, it is 
ANSWER: necessary to make certain assumptions about the popula- 

tions being sampled. The mathematical model which seems 
to represent the true situation most realistically is taken to be 


Yinm @i + Dut Od); 
+ (evd) ix + 


3 
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ww w 


where 


e; = the true effect of the i-th extract 
v; = the true effect of the j-th volume 
d, = the true effect of the k-th day 
(ev);; = the true effect of the interaction between the 7-th extract 


and the j-th volume 

(ed), = the true effect of the interaction between the 7-th extract 
and the k-th day 

(vd);, = the true effect of the interaction between the j-th volume 
and the k-th day 


(evd);;, = the true effect of the interaction among the 7-th extract, 
the j-th volume and the k-th day 

€,inm = the true effect of the m-th duplicate on the (ijk)-th treat- 

ment combination, plus all extraneous (random) effects. 
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As may be seen from an examination of the expected mean squares, 
the assumption has been made that we are interested in only the four 
extracts and only the three volumes used in the experiment. However, 
the days are considered to be a random sample of days. 

It is strongly indicated that the three volumes do not, on the average, 
have any appreciable effect on the determination of the amount of soluble 
matter in the three extracts. Also, this same result is consistently 
evident for each extract as indicated by the non-significance of the EV 
interaction. However, the VD interaction is significant at the five per- 
cent level indicating that differences among the three volumes are not 
consistent from day to day. 

This last result suggests that day differences may be great and this 
is verified by noting that the mean square for days is quite large (signifi- 
cant at the one percent level). The ED interaction is also significant 
at the one percent level. Since the effects due to days, extracts X days 
and volumes X days are all significant it appears that the method is, at 
present, unreliable in that differences among volumes will not be the 
same on different days. 

One possible explanation of the excessive variation among days may 
be the changes in concentration of the extract solutions because of 
temperature changes. That is, if the temperature increases markedly, 
the volume of solution will increase but the soluble matter will remain 
constant. This would mean that the same size (volume) samples 
pipetted on different days would contain varying amounts of soluble 
matter thus leading to wide variation among the results associated with 
the different days. Of course, if new solutions were prepared each day 
(with proper control of the soluble matter content), this difficulty might 
be avoided. However, in such a case, it is more than likely that failure 
to produce like extract solutions on successive days would again lead 
to excessive variation among days. 

Since differences among extracts were not of interest, discussion of 
this factor has been curtailed toa minimum. It is suggested that separate 
analyses might be run for each extract but it is doubtful if such a pro- 
cedure will in any way change the conclusions presented above. 
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ABSTRACTS 


THE BIOMETRIC SOCIETY—WNAR 


ARCHIE R. TUNTURI, M. D. (University of Oregon Medical 
189 School). The Auditory Cortex—A Probability Model* Prelimi- 
nary report. 


The role played by the brain in communication is well known, but 
in what manner the brain handles information is not understood. Some 
progress has been made in this direction by studying the anatomy and 
physiology of the auditory cortex in the anesthetized dog with controlled 
acoustic signals. Communication may be thought of as making a repre- 
sentation in a space of a representation in another space. In three of 
the four auditory areas (on one side of the brain), the entire frequency 
spectrum from 100 to 12800 cps is represented literally spacewise by 
groups of cells that respond only to a narrow range of frequencies. A 
special method increases the signal to noise ratio, by augmenting teh 
electrical response of the cells, thereby permitting exact measurements 
of the characteristic frequency and intensity for each group of cells. This 
is similar to a narrow band filter, and does not reveal the effect of other 
frequencies on the information. The information capacity of the system 
can be inferred if it can be assumed that occurrence of the augmented 
response for the group of cells follows some probability function. These 
probabilities for all groups of cells can be assembled into a model repre- 
senting the behavior of the system as a communication device. If 
there are 70 groups of cells between 100 and 12800 cps, the probability 
of any particular combination would be 1/2, if the selections were 
equally probable. The effect of noise on this system will be considered. 


*Aided by ONR, 
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STUART C. DODD, RICHARD J. HILL, SUSAN HUF- 

190 FAKER.* (Washington Public Opinion Laboratory, University 
of Washington). Testing Message Diffusion: The Utilization 
of Mathematical Models. 


In connection with the study of interpersonal verbal communication, 
the Washington Public Opinion Laboratory designed an experimental 
procedure which yielded data on the temporal diffusion of thirty-three 
different messages in a population of 184 individuals. Data (including 
the recipient of each message, the initiator of communication, and the 
time of communication) were obtained on 5,522 separate instances of 
communication. The experimental population was selected so that the 
assumption could reasonably be made that it was homogeneous relative 
to the interacting (i.e., each individual had an equal probability of 
interacting), and uninfluenced by the spatial factor in diffusion. In the 
attempt to describe the temporal diffusion of the messages, three rational 
mathematical models were compared with the empirical results. The 
models utilized were the logistic, the normal ogive, and the binomial 
distribution, for which the mathematical conditions are well indentified. 
The experiment had been designed to match those social conditions 
underlying the communication with the mathematical conditions under- 
lying the logistic model (namely, a constant probability of interacting in 
the population and through time). The higher and lower degrees of 
descriptive closeness and goodness of fit for the three models, and for 
ordinal and cardinal time units separately, are reported and discussed. 


A. BUZZATI-TRAVERSO. (University of Pavia, Italy and 
191 University of California, La Jolla). The Quantification of Evolu- 
tionary Phenomena. 


The quantification of evolutionary phenomena made possible by the 
introduction of statistical methods and of mathématical models has 
opened a new era in the study of evolution. Examples taken from recent 
paleontological literature show that it is now possible to analyze prob- 
lems like that of the rate of evolution during geological time. Examples 
taken from experimental studies of the author show that the develop- 
ment of a mathematical theory of evolution has made- it possible to 
observe directly the occurrence of elementary evolutionary changes in 
the course of a few generations. Special statistical or mathematical 


*Abstract of paper to be presented at the Western Biometrics Conference at the University of 
Oregon, Eugene, Oregon, June 19-21. The paper is to be given by Richard J. Hill, Research Sociologist, 
Washington Public Opinion Laboratory, University of Washington, Seattle, Washington. 
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approaches can be developed for the direct study of experimental data. 

The author points out that current mathematical theories are ade- 
quate only for simple cases where changes in the frequencies of single 
genes are involved, or where characters controlled by a small number of 
genes showing no complex interactions are studied. When, however, 
polygenic systems or whole genotypes are considered in their relation to 
evolutionary factors, the present models are quite insufficient. A series 
of examples are given from experiments of the author to illustrate the 
type of difficulties encountered. Since changes affecting such metric, 
polygenic characters seem to have played the most important role in the 
course of evolution, it seems very much worthwhile to attempt to develop 
new mathematical models for interpreting recent results. The author 
makes an appeal to mathematical colleagues for cooperation on this 
subject. 


C. O. JUNGE, JR. (University of Washington). Confidence 
192 Intervals on the Slopes of Regression Lines When Both Variables 
Are Subject to Error. 


In the problem of linear regression when both variables are subject 
to error Reiersol has shown that if the errors and the “‘steady”’ parts are 
normally distributed then the slope of the desired regression line is not 
identifiable. If the errors in the two variables are independent, however, 
a conservative confidence interval on the slope of the regression line can 
be determined even though the slope is not identifiable. 

If the “steady” parts are a set of fixed but unknown values, inde- 
pendent estimates of the rank of the steady parts are sufficient to de- 
termine a meaningful confidence interval. 

In case an instrumental variable as defined by Geary is available, a 
confidence interval can be determined. 


193 JULIUS A. JAHN. (Washington State College). Statistical 
Estimation from Time and Population Samples. 


Given a population of individuals, J: 1, 2, 3,...7...N;,, anda 
variable V; defined over this population; a random sampling procedure 
is to be designed to provide unbiased estimates of the mean value of V; 
for the population with a minimum variance of the estimated values for 
certain fixed sample sizes and other cost conditions. When the variable 
is dependent upon time, either by definition or by hypothesis (e.g. 
Growth, Learning, Attitudes, Income, or Expenditures), the use of esti- 
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mates from samples enumerated during one period of time to apply to 
later periods of time can lead to large and increasing bias due to changes 
with time, even though the original sample is large and selected in a 
random manner. 

Such biases can be eliminated or reduced to a negligible amount, 
obviously, by selecting and enumerating a random sample every time an 
estimate for a certain period is desired. This solution, however, is seldom 
adopted because of the large and increasing costs involved. Another 
solution which has been attempted is to draw a relatively small sample 
of individuals at one point of time, and to repeatedly enumerate the 
same individuals over relatively long periods of time. This sampling 
method, called “Panel sampling”, however, has not in practice elimi- 
nated or reduced the biases or costs because of the large and increasing 
percentage of the sampled individuals who are not enumerated with each 
successive attempt to do so over long periods of time. 

The solution proposed in this paper is to sub-divide the total period 
of time into a series of conveniently short intervals, T:1,2,3...¢...Nz, 
and to define as the universe to be sampled, the set of points (zt) formed 
by the combinations of the identification numbers for time intervals (¢) 
and those for the individuals in the population (z). Correspondingly, 
the variable is defined over these points (V;,) in such a way that the 
sum over the time intervals for any one individual is equal to V;. A 
sample design incorporating the use of weekly time intervals as “‘strata”’ 
within which were selected a random sample of points in each strata and 
a constant sampling rate for each strata was applied to derive estimates 
of the mean value of the total expenditures of students at the University 
of Washington during the Winter Quarter of 1951. By means of this 
sample design, the percentage of individuals in the sample who were not 
interviewed to obtain necessary information was reduced to less than 
10 percent. An analysis of the results also indicated that the use of 
“optimal sampling rates’ instead of constant sampling rates would lead 
to a decrease in the variances of the estimated means. In addition, 
estimates of the variances between the means of individuals for the 
different intervals of time as compared to the variances between indi- 
viduals within the same intervals of time indicated that the use of 
shorter intervals of time, such as one day, as “clusters” of points to be 
sub-sampled, in addition a stratification, would further reduce the vari- 
ances of estimation for a fixed sample size. 
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194 D. G. CHAPMAN. (University of Washington). The Analysis 
of Samples from Mixed Populations. 


Samples from natural populations will frequently include a mixture 
of several races or age groups. The analysis of such a sample into its 
component parts is useful for the determination of growth rates, mortali- 
ties, or the characteristics of the mixed populations. Several cases arise 
according to whether or not a simple age identifying factor is available, 
or whether or not supplementary information is available from other 
samples. Neyman’s extended x” formulae can be applied to these prob- 
lems to analyze the mixed samples into their component parts, and to 
estimate characteristics of the mixed population. Auxiliary problems 
that arise in utilizing the supplementary information are considered. 


HAROLD TIVEY, B. A. (University of Oregon Medical School, 

195 Portland, Oregon). Some Applications of the Log-Normal Dis- 
tribution to the Study of Survival Times in Chronic Fatal Dis- 
eases, Particularly Leukemia. 


Illustrations of some of the types of disease processes in which a 
log-normal distribution will closely approximate the distribution of 
survival times will be presented. 

A summary of survival data on over 2600 cases of leukemia taken 
from the literature will be briefly presented, with particular emphasis on 
the statistical problems arising from this investigation. The application 
of the Maximum Likelihood Method of Boag will be discussed. 


196 G.A.BAKER. (University of California). Field Trial Problems. 


Theoretical and empirical examinations of the correspondence of 
certain field trials to conventional mathematical models have been made. 
The correspondence may be quite good but on the other hand serious 
defections do exist. It seems apparent that a careful study of field plot 
design and methods of analysis are necessary in order to obtain more 
accurate measures of plant performance on definite plots. 

The classical mathematical models have led to marked advances in 
the theory and practice of field trials and no attempt is being made here 
to belittle their importance. The point that is being made is that when 
finer distinctions between varieties and treatments are attempted even 
minor deviations from mathematical theory are important. Every 
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effort must be made to improve investigative tools. This can be done in 
two ways. One is to develop more realistic models. This often fails 
because of lack of detailed knowledge of uniformity trials, inability to 
predict the behavior of biological material in complex situations, and 
because of mathematical difficulties. The other general method of im- 
proving field trials is to arrange the plots, select the plants, modify overall 
conditions, or transform the data so that classical models will clearly 
indicate differences. 


CALVIN F.SCHMID, VINCENT A. MILLER, and WARREN 
197 EE. KALBACH. (University of Washington). Population Fore- 
casts, State of Washington: 1950-1960 Methodological Summary. 


The cohort-survival method was used to prepare the population fore- 
casts for the State of Washington, April 1, 1955 and April 1, 1960, based 
on detailed data covering (1) an age-sex breakdown of the population 
on which projections are to be based; (2) age-sex specific mortality 
trends; (3) age-specific fertility trends for females in child-bearing ages; 
(4) age-sex-specific migration trends. Estimates are made on the age-sex 
distribution of the population as of the forecast dates. 


198 JACK R. BORSTING. (University of Oregon). On the Addi- 
tion of Chi-Squares, Preliminary Report. 


An example is used to compare the power of three tests of a simple 
hypothesis specifying the cell frequencies in a discrete population. The 
tests are as follows; the chi-square test based on a single sample, a chi- 
square test based on the addition of five independent chi-squares from 
separate samples, and a test which rejects the hypothesis if any one of 
the several chi-squares is significantly large. For the example considered 
the power of the single sample test was considerably greater than the 
power of the other two tests. For example, for one alternative the power 
of the single sample test is 0.41 and the power of the test using the sum 
of chi-squares is 0.20. Curves are drawn comparing the powers for a set 
of alternatives. 
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THE BIOMETRIC SOCIETY 


Third International Biometric Conference. Members of the Society 
should start planning to attend the Third International Biometric 
Conference on September 1-5, 1953, at Bellagio on Lake Como in 
northern Italy. The conference has been timed so as to facilitate at- 
tendance at other international biological meetings, such as the 12th 
International Congress of Limnology in Cambridge, England on August 
20-26, the Ninth International Genetic Congress in Bellagio on August 
24-31, immediately preceding the Biometric Conference, the 28th Session 
of the International Statistics Institute in Rome on September 6-12, and 
concurrently also in Rome, the Sixth International Microbiological 
Congress. The Organizing Committee for the Conference, of which the 
Executive Secretary is Dr. Luigi Cavalli, Istituto Sieroterapico Milanese, 
Via Darwin 20, Milano, Italy, is arranging sessions on the following 
topics: training in biometry, mathematical problems in genetics, indus- 
trial applications of experimental designs, public health and social 
medicine, biological assay with special reference to immunology, bio- 
metrical problems in ecology, functional relations in experimentation, 
methodological problems in biometry, and sequential experimentation. 
Several excursions will take advantage of the natural beauty of the 
region, and there will be opportunities to visit European biometrical 
laboratories before and after the Conference. All who have any possi- 
bility of attending are urged to write at once for further information to 
Dr. Cavalli or to the Secretary of the Society. 


ENAR. The Eastern North American Region cosponsored a five- 
week joint Biostatistics Conference with Iowa State College extending 
from June 16 to July 18. In successive weeks, two or three papers were 
presented each day for discussion in the following general areas: develop- 
ment of quantitative biology, specification of populations and their 
processes, the estimation of population, the estimation of biological 
effects, and biomathematical mechanisms within the individual and 
species. A more extended report will appear later. 

The symposium in this issue on “The Design and Interpretation of 
Clinical Experiments with Drugs” has been reprinted under separate 
cover and is available from the Secretary’s office at $1.40 per copy. 


WNAR. The Western North American Region met jointly with the 
Institute of Mathematical Statistics at the University of Oregon, in 


273 


| 


2 


274 BIOMETRICS, SEPTEMBER 1952 


Eugene, on June 19-20. Stochastic models were considered on the first 
day with papers by A. R. Tunturi and by S. C. Dodd, R. J. Hill and 
S. Huffaker, with M. A. Girshick in the chair, followed by an invited 
address on “The quantification of evolutionary phenomena” by A. 
Buzzati-Traverso. On the second day, an afternoon program of seven 
contributed papers by C. O. Junge, Jr.; J. A. Jahn; D. G. Chapman; 
H. Tivey; G. A. Baker; C. F. Schmid; V. A. Miller and W. E. Kahlbach; 
and J. Borsting, was chaired by Mary Elveback. Abstracts of these 
papers appear in this issue. The annual business meeting named the 
following regional officers for 1953: Vice-President—B. M. Bennett, 
Secretary-Treasurer—D. G. Chapman, Regional Committee Members 
for 1953-55—R. Sitgreaves and G. A. Baker. 


British Region. The 14th meeting of the British Region was held 
on July 9-10 at the University of Edinburgh, both sessions on July 9 
being sponsored jointly with the British Pharmacological Society. The 
morning program on “The Design and Evaluation of Clinical Trials” 
with J. H. Gaddum in the chair offered papers by A. Bradford Hill, W. A. 
Bain, and C. A. Keele and a general discussion opened by E. J. Wayne. 
In the afternoon session on “Statistical Problems Arising in Biological 
Assay”, with D. J. Finney in the chair, seven papers were presented by 
P. Armitage, O. L. Davies, E. C. Fieller, M. J. R. Healy, J. O. Irwin, 
M. R. Sampford, and J. W. Trevan. The morning program on July 10 
considered “Biometry and Educational Research”’, in papers by A. E. G. 
Pilliner, Miss J. Forbes, Sir Godfrey H. Thomson and D. N. Lawley, with 
B. Babington Smith presiding. 


Australasian Region. The 1952 Biennial Conference of the Austra- 
lasian Region at the University of Sydney on August 26 formed part of 
the program of the A.N.Z.A.A.S. The following papers were presented: 
“The size of sample required to include all types of the population” by 
E. J. Williams, “The statistical study of animal populations” by P. A. 
Moran, “The contribution differences between observers to some errors 
of biological measurement”’ by Helen Turner, “Discrimination between 
populations in which dimensions depend on age’’ by R. T. Leslie, and 
“The estimations of genetic merit in dairy bulls” by J. M. Rendel. 


| 

= 


pee 


